Skip to content
Data Harvesting

Data Harvesting

Project ID: 3308

Set of tools to harvest, process and uplift (meta)data from metadata providers within the Helmholtz association to be included in the Helmholtz Knowledge Graph (Helmholtz-KG). The harvested linked data in the form of schema.org jsonld is aggregated and uplifted in data pipelines to be included into a single large knowledge graph (KG). The tool set and harvesters can be used as a python library or over a commandline interface (CLI, hmc-unhide). Provenance of metadata changes is tracked rudimentary by saving graph patches of changes on rdflib Graph data structures on the semantic triple level. Harvesters support extracting data via sitemap, gitlab API, datacite API and OAI-PMH endpoints.

Project badgeProject badgeProject badgeProject badgeProject badge