Skip to content
Snippets Groups Projects
  1. Feb 05, 2025
  2. Jan 28, 2025
  3. Jan 20, 2025
  4. Jan 17, 2025
  5. Jan 15, 2025
  6. Jan 14, 2025
  7. Jan 13, 2025
  8. Jan 09, 2025
  9. Jan 03, 2025
    • Paul Millar's avatar
      update_oai-pmh Add support for querying DataCite resourceType · 10f6f60b
      Paul Millar authored
      Motivation:
      
      OAI-PMH, by itself, doesn't identify the nature of the resource; rather,
      this is achieved by the metadata record itself.
      
      Just to mention it, OAI-PMH sets don't provide any guaranteed semantics;
      such semantics can be added through the set description, but there's no
      consensus or practice in doing this.
      
      Therefore, in order to categorise OAI-PMH items by type, we need to
      obtain records: listing identifiers isn't sufficient.  Moreover, Dublin
      Core (as used currently) doesn't support the fine-grain type semantics
      we would like present.
      
      The DataCite metadata schema provides `resourceType` metadata, with the
      `resourceTypeGeneral` providing the course-grain type of the resource.
      This is what we would like to use.
      
      Modification:
      
      Add support for querying all records using the DataCite records.  This
      task is very similar to the existing code that lists all identifiers of
      records with Dublin Core.
      
      The patch adds support for querying DataCite metadata as mostly a
      copy-and-paste of the existing code.  This is technical debt that future
      patches MUST address, through refactorinng.
      
      The OAI-PMH client code is updated to support ListRecords requests.
      This is also a copy-n-paste, inducing further technical debt that future
      patches must address.
      
      Result:
      
      The facilities YAML file now includes a breakdown of OAI-PMH items based
      on their DataCite resourceTypeGeneral.
      10f6f60b
  10. Jan 02, 2025
  11. Jan 01, 2025
  12. Dec 31, 2024
    • Paul Millar's avatar
      Add possibility to skip harvesting · e68d4a9b
      Paul Millar authored
      Motivation:
      
      Some OAI-PMH endpoints are broken; moreover, they're broken in such a
      way that makes harvesting information wastes a lot of time without
      producing useful information.
      
      The specific example is the ISIS endpoint, which is both very slow (~10
      seconds per request) and, after ~9 hours of havesting returns a
      resumptionToken that results in failures in a subsequent ListIdentifiers
      request.
      
      The ESS endpoint is also broken.  While also annoying, the impact is
      less because of special handling when a server hasn't provided useful
      information.
      
      The goal is to allow selective disabiling of harvesting while continuing
      to update high-level OAI-PMH information based on the Identity call.
      
      Modification:
      
      Add a `skip-harvesting` boolean option.  If set with the value true then
      harvesting is skipped for this endpoint.
      
      Result:
      
      It's possible to update all endpoints without a very long and fruitless
      time spent harvesting from broken endpoints.
      e68d4a9b
    • Paul Millar's avatar
      Update facility data OAI-PMH metadata to record information as items · ca76d30d
      Paul Millar authored
      Motivation:
      
      The current OAI-PMH information is recorded as `datasets`.  However,
      this assumes that the items underlying the harvested OAI-PMH records
      correspond to datasets.  This is not guaranteed, and there are
      counter-examples.
      
      OAI-PMH describes three concepts: resource, item and record.  The
      OAI-PMH responses provide records (descriptive metadata of some item) or
      identifiers thereof.  However, since OAI-PMH requires repositories to
      support Dublin Core records, it seems a reasonable assumption for there
      to be a 1:1 relationship between each Dublin Core record and some
      corresponding item.
      
      Therefore, we can use the metadata-agnostic `item` concept when
      describing the information about the endpoint.
      
      Modification:
      
      Update script to record information under `items` node in the facilities
      YAML file.
      
      Update the Jekyll to consume this information when rendering the
      corresponding HTML.
      
      Result:
      
      No observable change, but the facilities YAML file now uses the more
      neutral 'items' instead of 'datasets'.
      ca76d30d
    • Paul Millar's avatar
      update_oai-pmh record adminEmail address · 6039af5d
      Paul Millar authored
      Motivation:
      
      OAI-PMH provides admin contact details as email addresses. This could
      prove useful information.  One such situation is when the OAI-PMH
      endpoint is not working.  When this happens, the admin contact details
      are no longer available from the endpoint, so caching the values would
      prove useful.
      
      Modification:
      
      Update code to capture the admin contact information and record it
      against the facility-specific information.
      
      If, when updating the OAI-PMH details, the admin contact details are
      discovered (from the OAI-PMH endpoint) then any existing contact details
      are replaced with the discovered information; otherwise, any existing
      admin contact details are left unmodified.
      
      Result:
      
      We collect and cache OAI-PMH admin contact details in the facility
      metadata.
      6039af5d
  13. Dec 26, 2024
  14. Dec 22, 2024
  15. Dec 20, 2024
  16. Dec 11, 2024
  17. Dec 06, 2024
  18. Dec 05, 2024
  19. Dec 04, 2024
  20. Nov 01, 2024
  21. Feb 19, 2024
  22. Feb 08, 2024
  23. Feb 07, 2024
  24. Dec 01, 2023
  25. Sep 28, 2023
Loading