Review & update averaging algorithm of scores for the sunburst plot

The following analysis was performed on the data collected with version 3 of the toolbox / presented with version 3 of the dashboard.

The following table shows the average scores averaged over ALL UNIQUE IDs found in the database, WHERE publication.type = 'Dataset' AND publication.publication_year >= 2000 AND publication.publication_year <= 2025 AND reference.sub_type = 'IsSupplementedBy'.

Test-result	Locally averaged Scores v3	Averaged Scores v3 shown in the Dashboard	Rel. Deviation
score_percent_A	51.1%	52.8%	3.3%
score_percent_A1	39.7%	41.6%	4.8%
score_percent_A1.1	59.7%	61.3%	2.7%
score_percent_A1.2	59.7%	61.3%	2.7%
score_percent_F	58.2%	61.5%	5.7%
score_percent_F1	89.0%	91.4%	2.7%
score_percent_F2	81.0%	85.8%	5.9%
score_percent_F3	19.4%	22.6%	16.5%
score_percent_F4	23.9%	26.7%	11.7%
score_percent_FAIR	56.9%	59.2%	4.0%
score_percent_I	66.7%	67.7%	1.5%
score_percent_I1	99.7%	99.8%	0.1%
score_percent_I2	16.6%	15.0%	-9.6%
score_percent_I3	83.9%	88.3%	5.2%
score_percent_R	52.3%	55.4%	5.9%
score_percent_R1	45.1%	47.8%	6.0%
score_percent_R1.1	11.0%	21.6%	96.4%
score_percent_R1.2	93.9%	94.5%	0.6%
score_percent_R1.3	59.1%	60.2%	1.9%

The comparison to averaged scores shown in the most recent test instance of the dashboard / subpage "Data in Helmholtz" (where no center or research field is selected) reveals deviations.

My understanding of the deviations observed is that the current method of calculating average scores in the dashboard center by center and then averaging these scores in a weighted manner does NOT account for multiple occurrence of datasets for multiple centers, which induces distortions on the average scores. Dropping these multiple occurrences reduces the number of IDs considered in the average by 8.8 percent.

The averaging algorithm needs to be updated to remove this bias.

Edited Oct 31, 2025 by Markus Kubin