Skip to content

DeepSpeed ZeRO DP strategies

Marcus Hardt requested to merge deepspeed-zero into main

Created by: lehr-fa

Adds DeepSpeed ZeRO DP strategies (https://arxiv.org/abs/1910.02054v3), which allow training with --subsampling-depth=100 --cropping-size 400 --batch-size=1 without running out of memory. However, this requires mixed-precision (--precision 16). Unfortunately, FusedAdam algorithm does somehow not work, so ZeRO Stage 3 cannot be used at the moment.

Merge request reports