Indexer, extract_enty_dics misses entities
example data:
{"@context": "http://schema.org", "@id": "https://doi.org/10.26165/JUELICH-DATA/UBNGI2", "@type": "Dataset", "author": [{"@id": "https://orcid.org/0000-0003-3773-4377", "affiliation": "Forschungszentrum J\u00fclich", "identifier": "https://orcid.org/0000-0003-3773-4377", "name": "Hoffmann, Lars"}, {"@id": "https://orcid.org/0000-0002-2483-5761", "affiliation": "Forschungszentrum J\u00fclich", "identifier": "https://orcid.org/0000-0002-2483-5761", "name": "Spang, Reinhold"}], "creator": [{"@id": "https://orcid.org/0000-0003-3773-4377", "affiliation": "Forschungszentrum J\u00fclich", "identifier": "https://orcid.org/0000-0003-3773-4377", "name": "Hoffmann, Lars"}, {"@id": "https://orcid.org/0000-0002-2483-5761", "affiliation": "Forschungszentrum J\u00fclich", "identifier": "https://orcid.org/0000-0002-2483-5761", "name": "Spang, Reinhold"}], "dateModified": "2022-03-17", "datePublished": "2021-06-25", "description": ["This data repository provides access to tropopause parameters estimated from meteorological reanalyses. Currently, the repository covers ERA5, ERA-Interim, MERRA-2, and the NCEP/NCAR Reanalysis 1. The tropopause data files provide geopotential height, pressure, temperature, and water vapor volume mixing ratio for the WMO 1st and 2nd tropopause, the cold point, and the dynamical tropopause."], "funder": [{"@type": "Organization", "name": "NIC"}], "identifier": "https://doi.org/10.26165/JUELICH-DATA/UBNGI2", "includedInDataCatalog": {"@type": "DataCatalog", "name": "J\u00fclich DATA", "url": "https://data.fz-juelich.de"}, "keywords": ["Computer and Information Science", "Earth and Environmental Sciences", "atmosphere, tropopause, meteorological reanalyses"], "license": {"@type": "Dataset"}, "name": "Reanalysis Tropopause Data Repository", "provider": {"@type": "Organization", "name": "J\u00fclich DATA"}, "publisher": {"@type": "Organization", "name": "J\u00fclich DATA"}, "temporalCoverage": ["2000-01-01/2018-12-31"], "version": "1"}
It only finds 5 sub entities:
2023-08-09 15:13:37 | INFO | Found 5 entities for indexing.
2023-08-09 15:13:37 | ERROR | UnhandledDispatchException: On Organization, {'@type': 'Organization', 'name': 'NIC'}
2023-08-09 15:13:37 | ERROR | UnhandledDispatchException: On DataCatalog, {'@type': 'DataCatalog', 'name': 'Jülich DATA', 'url': 'https://data.fz-juelich.de'}
2023-08-09 15:13:37 | ERROR | UnhandledDispatchException: On Dataset, {'@type': 'Dataset'}
2023-08-09 15:13:37 | ERROR | UnhandledDispatchException: On Organization, {'@type': 'Organization', 'name': 'Jülich DATA'}
2023-08-09 15:13:37 | ERROR | UnhandledDispatchException: On Organization, {'@type': 'Organization', 'name': 'Jülich DATA'}
So it currently misses out on all things that do not have a type (like all the authors). Which currently is correct behavior since without a type we cannot put it anywhere in the index.
Solution: Either solved by uplifting the data prior indexing. Or for certain cases like the authors one could assert a type.