Skip to content

Avoid Double blank nodes

We have to avoid double blank nodes.

The way we plan to do this is by

  1. give everything an internal ID. generated as such: (_)https://helmholtz-metadaten.de/.well-known/genid/<repo>/attpath/#hash

The attpath should be a json path in the file to the attr. This ensures that the same id is created if the same thing is uploaded again. But for lists this is a problem. convention here is item_<i> but if the list good rearranged or a member was inserted we would give the same id to something else. To avoid this case we could also to include a hash in the id of the content of the attribute. The .well-know key in the path is a SKOS or even internet conventions for dead links, so a smart system can deal with these. the (_) might be need for certain rdf software to reconize it as a blank node.

if an id already exists the new one is added via a sameAs link.

  1. Try to reuse as many ids as possible. i.e enrich the files data with ids for persons ans orgs.

What this does not solve it the problem of entity resolution. i.e to decide that two things from two different sources are the same

Edited by Jens Bröder