Skip to content

SPARQL updates to be implemented

Constrains on the linked data to implement:

  • 1. if author/contributor/maintainer has affiliation (or other properties which are allowed for person and not oranizations, or the other way around), it is of type person

  • 2. an affiliation is always of the type organization see https://schema.org/affiliation

  • 3. if organization name is is the same as an other organization, these are the same, I.e give the same roar id. Also if stripped names are the same.

  • 4. if something does not have an @id use identifier, email, url, name? Instead on classes where it makes sense.

  • 5. Apply list of same names. I.e Forschungszentrum Jülich, Forschungszentrum Juelich, Forschungszentrum Jülich GmbH should be cleaned to a single one. How to deal with child relationships? i.e Institutes, of which nearly all do not have a roar id

  • 6. Filter contains for public graph/public-UI, like remove some information like emails, blank node ids etc

The validation of data entries can be done with shacl shapes. here are the ones for whole schema.org (https://datashapes.org/schema). To these we could add further constrains if necessary.

Please add further ideas to the list.

The constrains of the above list should be implemented as SPARQL updates (https://www.w3.org/TR/sparql11-update/), which we will store for now in files. it might also be possible to use a template query and replace a certain string, for example in the name matching case.

Test data for these can be found here: https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_mining/-/tree/main/tests/data/instances

Edited by Said Fathalla