Robustness of Uploader: Uploader pipeline fails if resource i temporarily unavailale
While uploading many records to the sparql endpoint, the whole pipeline currently fails if the endpoints returns a 503 at some point.
...
File "/usr/src/data_harvesting/pipeline/uploader.py", line 57, in run_uploader
upload_data_filepath(file_, infrmt=infrmt, graph_name=graph_name, endpoint_url=endpoint_url, username=username, passwd=passwd)
File "/usr/src/data_harvesting/util/data_model_util.py", line 251, in upload_data_filepath
upload_data(
File "/usr/src/data_harvesting/util/data_model_util.py", line 178, in upload_data
results = sparql.query()
...
result = func(*args)
File "/usr/local/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable
2024-05-26 13:55:28.229 | ERROR | Flow run 'upload-oai-nrecords-819887-to-None' - Finished in state Failed('Flow run encountered an exception. HTTPError: HTTP Error 503: Service Temporarily Unavailable')
Solution: It would be better to implement an incremental retry for max 10 min? or a day?. If it still fails, the failure should be logged and the prefect flow should not fail. but log the failure instead.