Improved provenance tracking mechanism for uplifting

changed the description

added Backlog label

This was envisioned, because the current decorate 'generate_patch' is bound to graphs and it changes the return signature of the decorated function, i.e. it adds the patch as an output, to be then used in further code.

Of course this could be fixed in requiring that functions always have to return this, with and without the decorator switched on.

Also the whole implementation of provenance tracking and how to do it depends one has to consider the following questions:

Why and for what do we want to track the provenance? Do we just want to understand what happened, or do we want such detailed provenance that one can reproduce from the provenance the data? (Which was the idea of the patches, i.e no matter the operation each graph operation could be redone and undone by applying the resulting rdf level patch on the graph. Also there was some hope that one could also use these patches on the file level to update/patch the data already in the large graph. The later turned out to be much more difficult.)
Is there a library that one can already use for the needed purpose and needs?
How do you ensure that provenance is right and not broken?

from user is responsible to data pipeline dev is responsible to ensure this, to designed system makes it impossible to break it, i.e by forcing this to be done in only a specific way. This may conflict with point 4. Also this is the idea of decentralized versus central provenance tracking and how granularity can be adjusted by a developer.

How much additional work and knowledge is required for the developer to understand the tracking system and how to use it right? Do not under estimate this point. i.e just one decorator to add, to requirements on the functions and how they have to be implemented and report.

Improved provenance tracking mechanism for uplifting

Designs

Child items ...

Activity