From ca76d30da21bbd1d5a5ea2250f417b05778b364d Mon Sep 17 00:00:00 2001 From: Paul Millar <paul.millar@desy.de> Date: Tue, 31 Dec 2024 10:10:13 +0100 Subject: [PATCH] Update facility data OAI-PMH metadata to record information as items Motivation: The current OAI-PMH information is recorded as `datasets`. However, this assumes that the items underlying the harvested OAI-PMH records correspond to datasets. This is not guaranteed, and there are counter-examples. OAI-PMH describes three concepts: resource, item and record. The OAI-PMH responses provide records (descriptive metadata of some item) or identifiers thereof. However, since OAI-PMH requires repositories to support Dublin Core records, it seems a reasonable assumption for there to be a 1:1 relationship between each Dublin Core record and some corresponding item. Therefore, we can use the metadata-agnostic `item` concept when describing the information about the endpoint. Modification: Update script to record information under `items` node in the facilities YAML file. Update the Jekyll to consume this information when rendering the corresponding HTML. Result: No observable change, but the facilities YAML file now uses the more neutral 'items' instead of 'datasets'. --- _data/facilities.yml | 18 +++++++++--------- open-data-resources.html | 10 +++++----- scripts/update_oai-pmh.rb | 10 +++++----- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/_data/facilities.yml b/_data/facilities.yml index 9196bc2..35f9e7a 100644 --- a/_data/facilities.yml +++ b/_data/facilities.yml @@ -31,7 +31,7 @@ status: Active adminAddress: - mis@cells.es - datasets: + items: count: 20 sets: count: 2 @@ -107,7 +107,7 @@ link: https://data.ceric-eric.eu/oaipmh/request last-check: 2024-12-22 status: Active - datasets: + items: count: 7 sets: count: 1 @@ -152,7 +152,7 @@ status: Active adminAddress: - datapolicy@esrf.fr - datasets: + items: count: 7202 pan-search-api: link: https://icatplus.esrf.fr/api @@ -179,7 +179,7 @@ link: https://oai.panosc.ess.eu/openaire/oai last-check: 2024-12-22 status: Active - datasets: + items: count: 0 adminAddress: - max.novelli@ess.eu @@ -216,7 +216,7 @@ adminAddress: - luis.maia@xfel.eu - krzysztof.wrona@xfel.eu - datasets: + items: count: 5 sets: count: 1 @@ -257,7 +257,7 @@ status: Active adminAddress: - icatmaster@helmholtz-berlin.de - datasets: + items: count: 28952 sets: count: 3 @@ -299,7 +299,7 @@ last-check: 2024-12-22 adminAddress: - rodare-admin@hzdr.de - datasets: + items: count: 1020 sets: count: 39 @@ -453,7 +453,7 @@ last-check: 2024-12-22 adminAddress: - data@ill.fr - datasets: + items: count: 0 pan-search-api: link: https://fairdata.ill.fr/fairdata/api @@ -541,7 +541,7 @@ last-check: 2024-12-22 adminAddress: - carlo.minotti@psi.ch - datasets: + items: count: 0 pan-search-api: link: https://dacat.psi.ch/panosc-api/ diff --git a/open-data-resources.html b/open-data-resources.html index 87c6a0f..ecab085 100644 --- a/open-data-resources.html +++ b/open-data-resources.html @@ -58,18 +58,18 @@ title: Open data resources <div class="tooltip-cont"> <div class="trigger-tooltip">Endpoint: <a href="{{facility.odr.oai-pmh-endpoint.link}}">link</a> [<a href="{{facility.odr.oai-pmh-endpoint.link}}?verb=Identify">Identify</a>]<br/>{% - if facility.odr.oai-pmh-endpoint.datasets - %}Datasets: <b>{{facility.odr.oai-pmh-endpoint.datasets.count | thousands_separated}}</b>{% + if facility.odr.oai-pmh-endpoint.items + %}Datasets: <b>{{facility.odr.oai-pmh-endpoint.items.count | thousands_separated}}</b>{% assign total_oai_pmh_dataset_count = total_oai_pmh_dataset_count | plus: facility.odr.pan-search-api.dataset_count %}{% - if facility.odr.oai-pmh-endpoint.datasets.sets%} + if facility.odr.oai-pmh-endpoint.items.sets%} <br/> - <span class="caret">Categories: {{facility.odr.oai-pmh-endpoint.datasets.sets.count | thousands_separated}}</span> + <span class="caret">Categories: {{facility.odr.oai-pmh-endpoint.items.sets.count | thousands_separated}}</span> <div class="toggled"> <table> <tr> <th>Category</th><th>Datasets</th> </tr> - {% for set in facility.odr.oai-pmh-endpoint.datasets.sets.details %} + {% for set in facility.odr.oai-pmh-endpoint.items.sets.details %} <tr> <td>{{set[1].name}}</td><td>{{set[1].count | thousands_separated}}</td> </tr> diff --git a/scripts/update_oai-pmh.rb b/scripts/update_oai-pmh.rb index 9da1dca..1e91caa 100644 --- a/scripts/update_oai-pmh.rb +++ b/scripts/update_oai-pmh.rb @@ -393,16 +393,16 @@ facilities.each do |facility| oai_pmh['adminAddress'] = adminAddress end - oai_pmh.delete('datasets') + oai_pmh.delete('items') if status == "Active" - datasets = {} - oai_pmh['datasets'] = datasets - datasets['count'] = total_count + items = {} + oai_pmh['items'] = items + items['count'] = total_count if !set_count.empty? sets = {} - datasets['sets'] = sets + items['sets'] = sets sets['count'] = set_count.length() -- GitLab