- Dec 31, 2024
-
-
Paul Millar authored
Motivation: Some OAI-PMH endpoints are broken; moreover, they're broken in such a way that makes harvesting information wastes a lot of time without producing useful information. The specific example is the ISIS endpoint, which is both very slow (~10 seconds per request) and, after ~9 hours of havesting returns a resumptionToken that results in failures in a subsequent ListIdentifiers request. The ESS endpoint is also broken. While also annoying, the impact is less because of special handling when a server hasn't provided useful information. The goal is to allow selective disabiling of harvesting while continuing to update high-level OAI-PMH information based on the Identity call. Modification: Add a `skip-harvesting` boolean option. If set with the value true then harvesting is skipped for this endpoint. Result: It's possible to update all endpoints without a very long and fruitless time spent harvesting from broken endpoints.
-
Paul Millar authored
Motivation: The current OAI-PMH information is recorded as `datasets`. However, this assumes that the items underlying the harvested OAI-PMH records correspond to datasets. This is not guaranteed, and there are counter-examples. OAI-PMH describes three concepts: resource, item and record. The OAI-PMH responses provide records (descriptive metadata of some item) or identifiers thereof. However, since OAI-PMH requires repositories to support Dublin Core records, it seems a reasonable assumption for there to be a 1:1 relationship between each Dublin Core record and some corresponding item. Therefore, we can use the metadata-agnostic `item` concept when describing the information about the endpoint. Modification: Update script to record information under `items` node in the facilities YAML file. Update the Jekyll to consume this information when rendering the corresponding HTML. Result: No observable change, but the facilities YAML file now uses the more neutral 'items' instead of 'datasets'.
-
Paul Millar authored
Motivation: The ListMetadataFormats call can fail. Currently, this causes the enire script to fail. Modification: Catch the exception and report a failure. Result: A metadata prefix lookup failure is now limited to a single OAI-PMH endpoint
-
Paul Millar authored
Motivation: Different OAI-PMH endpoints provide different performance characteristics. It would be helpful to categorise them Modification: Introduce a Stats class to capture statistics Monkey-patch float to support printing number to specific significant figures. Produce request stats per round (40 requests) and overall as output.
-
Paul Millar authored
-
Paul Millar authored
Motivation: OAI-PMH provides admin contact details as email addresses. This could prove useful information. One such situation is when the OAI-PMH endpoint is not working. When this happens, the admin contact details are no longer available from the endpoint, so caching the values would prove useful. Modification: Update code to capture the admin contact information and record it against the facility-specific information. If, when updating the OAI-PMH details, the admin contact details are discovered (from the OAI-PMH endpoint) then any existing contact details are replaced with the discovered information; otherwise, any existing admin contact details are left unmodified. Result: We collect and cache OAI-PMH admin contact details in the facility metadata.
-
- Dec 29, 2024
-
-
Paul Millar authored
-
Paul Millar authored
Motivation: Some server-side problems are quickly fixed by retrying the request, while other problems take longer to recover. Using a fixed duration as a delay between attempts makes it hard to reconcile these different failure modes. Modification: Add a progressive timeout strategy. The delay between attempts now scales linearly between successive attempts. As a special case, the retry strategy is different if the server has not provided any useful response. In this case, a fixed delay is used. Also, catch more problematic scenarios and handle them with the same retry strategy. Result: More robust handling of server-side errors, by retrying.
-
Paul Millar authored
-
- Dec 26, 2024
-
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
The recovery logic when faced with an error querying the identifiers is currently broken, as it returns nil instead of a hash.
-
Paul Millar authored
Observed transitory failures with GET requests; for example, timeouts. We would like the harvesting routine to be robust against such temporary failures.
-
- Dec 22, 2024
-
-
Paul Millar authored
-
Paul Millar authored
Remove redundant whitespace and change some spaces to use non-breaking space instead.
-
Paul Millar authored
Calculate total OAI-PMH. Move both totals into the table
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
The display now shows the number of datasets available from the endpoint.
-
Paul Millar authored
-
- Dec 21, 2024
-
-
Paul Millar authored
-
- Dec 20, 2024
-
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
- Dec 12, 2024
-
-
Paul Millar authored
-
- Dec 11, 2024
-
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-
- Dec 09, 2024
-
-
Paul Millar authored
Commit fe68e8d5 fixed the script for checking whether the PaN search API is working. Unfortunately it failed to fix the broken links on the webpage. This patch fixes those links on the webpage
-
- Dec 06, 2024
-
-
Paul Millar authored
-
Paul Millar authored
The `check_search_api.rb` script contained a bug because of the following issue: https://github.com/panosc-eu/search-api/issues/75 This updates the script and rectifies the status of institutes.
-
- Dec 05, 2024
-
-
Paul Millar authored
-
- Dec 04, 2024
-
-
Paul Millar authored
Don't display anything in Open data repository column if a facility doesn't have the corresponding link. For OAI-PMH and PaN-Search endpoints, only show a red cell if status is Error.
-
Paul Millar authored
-
Paul Millar authored
-
Paul Millar authored
-