Skip to content

Implement the datapipeline with Prefect.

After the HIFIS consultance meeting and looking at some workflow managers we decided to go with Prefect for the data pipeline https://www.prefect.io/.

The pipeline is meant to guide the record data through harvesting, uplifting, indexing and uploading.

Tasks here:

  • Basic design of the pipeline, what becomes a flow, what a task.
  • Clean up and unify harvester interfaces issue #41 (closed).
  • First implementation that runs through all modules
  • Improve the information flow in the dashboard. i.e logs, results, artifacts, persistence of data. but also meaningful labels and descriptions of flows.
  • related to the information flow, the indexer and uploader should log what fails
  • Decide how to deal with the different configuration files of the modules, allow them to came from one? require them to be provided?

lower priority:

  • performance, how long does a full run take? Speed that up of smarter uplifting and the use of multi-threading.
  • error handling of common errors.
Edited by Jens Bröder