Airflow SIGTERM due to resource limitations: Investigate + FAQ
We encountered now at multiple (RACOON) sites that the nnunet-training workflow fails in the get-ref-series-ct operator with a received SIGTERM signal (see logs below).
Since the get-ref-series-ct operator is LocalOperator, it runs in the Airflow pod. That's why increasing the resources of the Airflow Scheduler and Airflow Webserver Deployments resolves the issue.
The tasks of this issue would be:
- investigate which resources have to be increased (only Airflow Scheduler or Airflow Webserver or both)
- write a step-to-step fix as a FAQ entry such that we can point our project partners to that FAQ entry and saves their and our time!