Implement the datapipeline with Prefect.
After the HIFIS consultance meeting and looking at some workflow managers we decided to go with Prefect for the data pipeline https://www.prefect.io/.
The pipeline is meant to guide the record data through harvesting, uplifting, indexing and uploading.
Tasks here:
-
Basic design of the pipeline, what becomes a flow, what a task. -
Clean up and unify harvester interfaces issue #41 (closed). -
First implementation that runs through all modules -
Improve the information flow in the dashboard. i.e logs, results, artifacts, persistence of data. but also meaningful labels and descriptions of flows. -
related to the information flow, the indexer and uploader should log what fails -
Decide how to deal with the different configuration files of the modules, allow them to came from one? require them to be provided?
lower priority:
-
performance, how long does a full run take? Speed that up of smarter uplifting and the use of multi-threading. -
error handling of common errors.
Edited by Jens Bröder