SPARQL updates to be implemented

Constrains on the linked data to implement:

1. if author/contributor/maintainer has affiliation (or other properties which are allowed for person and not oranizations, or the other way around), it is of type person
2. an affiliation is always of the type organization see https://schema.org/affiliation
3. if organization name is is the same as an other organization, these are the same, I.e give the same roar id. Also if stripped names are the same.
4. if something does not have an @id use identifier, email, url, name? Instead on classes where it makes sense.
5. Apply list of same names. I.e Forschungszentrum Jülich, Forschungszentrum Juelich, Forschungszentrum Jülich GmbH should be cleaned to a single one. How to deal with child relationships? i.e Institutes, of which nearly all do not have a roar id
6. Filter contains for public graph/public-UI, like remove some information like emails, blank node ids etc

The validation of data entries can be done with shacl shapes. here are the ones for whole schema.org (https://datashapes.org/schema). To these we could add further constrains if necessary.

Please add further ideas to the list.

The constrains of the above list should be implemented as SPARQL updates (https://www.w3.org/TR/sparql11-update/), which we will store for now in files. it might also be possible to use a template query and replace a certain string, for example in the name matching case.

Test data for these can be found here: https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_mining/-/tree/main/tests/data/instances

Edited Jan 25, 2023 by Said Fathalla