Skip to content

data harvesting

Projects with this topic

  • Set of tools to harvest, process and uplift (meta)data from metadata providers within the Helmholtz association to be included in the Helmholtz Knowledge Graph (Helmholtz-KG). The harvested linked data in the form of jsonld is aggregated and uplifted in data pipelines to be included into a single large knowledge graph (KG). The tool set and harvesters can be used as a python library or over a commandline interface (CLI, hmc-unhide). Provenance of metadata changes is tracked rudimentary by saving graph patches of changes on rdflib Graph data structures on the semantic triple level. Harvesters support extracting data via sitemap, gitlab API, datacite API and OAI-PMH endpoints.