Feedback to OpenAIRE about missing scholix links
E-Mail sent via OpenAIRE helpdesk on 2025/11/25 17:00h
Dear ScholeXplorer/OpenAIRE team,
we are developing the HMC Dashboard on Open and FAIR Data in Helmholtz as part of our work in the Helmholtz Metadata Collaboration (HMC). In this project, we identify and analyze data publications linked to scholarly research papers, using the ScholeXplorer API.
During recent re-harvesting activities, we noticed a significant discrepancy between our earlier and current results, which may indicate a change in how links are currently exposed in the API.
Summary of our observations:
- Our previous database (harvested a few months ago) was built using the following API call: https://api.scholexplorer.openaire.eu/v3/Links?sourcePid=
- We filtered Scholix links for: (a) target.type = dataset and (b) RelationshipType.SubType = IsSupplementedBy (formerly used; now apparently issupplementto)
- When re-harvesting a subset 1k random article PIDs using the same API call, we now find only about 2% of the dataset links that were previously present, irrespective of their type or RelationshipType.SubType.
- A manual check in DataCite metadata for several missing dataset DOIs revealed that many of those not found through ScholeXplorer any more are still correctly linked to the scholarly research articles there, using the DataCite relationType "IsSupplementTo".
- For comparison, we also performed the inverse query: https://api.scholexplorer.openaire.eu/v3/Links?targetPid=, filtering for (a) source.type = "dataset" and (b) RelationshipType.SubType = "issupplementto".
- With the inverse query, we still find around 31% of the expected dataset links, irrespective of their type or RelationshipType.SubType. Roughly 6% of these appear to have changed their relationship type since the previous harvesting.
- We also verified that these discrepancies are not related to ScholeXplorer API version differences between v2 and v3 - the results are identical in both versions.
Based on these findings, it appears that the ScholeXplorer API currently exposes only a small subset of the previously available links, and the directionality or relation type (issupplementedby vs. issupplementto) might now affect query completeness differently than before.
Could you please confirm whether there have been any recent changes in:
- The link direction definitions or mapping of RelationshipType / RelationshipType.SubType;
- The internal synchronization process between relations exposed in DataCite records and those included in Scholexplorer / the OpenAIRE research graph;
- Or any filtering or update processes that might explain this reduction in retrieved links?
We would be happy to provide specific DOIs and example JSON responses if that helps with investigating the issue.
We would greatly appreciate your support with this matter, as our system critically relies on ScholeXplorer for accurate and comprehensive linking between research articles and datasets.
Thank you very much for your time and support.