Failure to pull repo on hifis-runner-manager-1 (per-build networking)
I've observed some jobs on shared runners during failing to pull the project's Git repo due to a connection error. This seems to be isolated to jobs being run/managed by hifis-runner-manager-1.
Getting source from Git repository
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/kit-scc-sdm/onlinestorage/httpd-docker/.git/
Created fresh repository.
fatal: unable to access 'https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker.git/': OpenSSL SSL_connect: Connection reset by peer in connection to gitlab.hzdr.de:443
Jobs fail to pull the repo on hifis-runner-manager-1:
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409851
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409853
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409862
Same jobs succeed on gitlab-runner-manager.hemera (these are retries via web UI of the jobs above):
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409852
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409863
- https://gitlab.hzdr.de/kit-scc-sdm/onlinestorage/httpd-docker/-/jobs/409874
This might be related to the FF_NETWORK_PER_BUILD
feature flag being enabled, which switches on the per-build networking mode where each job receives an ephemeral Docker network attached to all containers.
Side note: Runner 25 mentioned on the list of Shared Runners in the wiki no longer exists, presumably it was replaced by hifis-runner-manager-1. Using names instead of numeric IDs may be more maintainable?