SPARQL updates to be implemented
Constrains on the linked data to implement:
-
1. if author/contributor/maintainer has affiliation (or other properties which are allowed for person and not oranizations, or the other way around), it is of type person -
2. an affiliation is always of the type organization see https://schema.org/affiliation -
3. if organization name is is the same as an other organization, these are the same, I.e give the same roar id. Also if stripped names are the same. -
4. if something does not have an @id use identifier, email, url, name? Instead on classes where it makes sense. -
5. Apply list of same names. I.e Forschungszentrum Jülich, Forschungszentrum Juelich, Forschungszentrum Jülich GmbH should be cleaned to a single one. How to deal with child relationships? i.e Institutes, of which nearly all do not have a roar id -
6. Filter contains for public graph/public-UI, like remove some information like emails, blank node ids etc
The validation of data entries can be done with shacl shapes. here are the ones for whole schema.org (https://datashapes.org/schema). To these we could add further constrains if necessary.
Please add further ideas to the list.
The constrains of the above list should be implemented as SPARQL updates (https://www.w3.org/TR/sparql11-update/), which we will store for now in files. it might also be possible to use a template query and replace a certain string, for example in the name matching case.
Test data for these can be found here: https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/data_mining/-/tree/main/tests/data/instances