Gitlab data pipeline todos

Thinks to advanced the gitlab pipeline:

So far I have not figured this out, since one only gets how many forks a project has

Better codemeta harvester, less failures and more data DOI badges and other badges, repo and git status, i.e. fill up the codemeta.json more. We could also store additional metadata. For example the person ids are not good, AUTHOrs files are parsed badly
Implement the since feature? Currently we just pull on every repo. But one could speed up the pipeline by pulling only repos where gitlab tells us that there was activity since the last pipeline run.
Better quality of codemeta.json data. some metadata created by codemeta harvester is wrong.
Implemented a shallow git clone to not checkout all the large files.

Edited Jul 07, 2023 by Jens Bröder