Model checkpointing
Created by: lehr-fa
Adds checkpointing and chunking to the computation of TiedAxialColumnAttention, which reduces the memory pressure by sequentializing dot products and disabling caching of intermediates.
Created by: lehr-fa
Adds checkpointing and chunking to the computation of TiedAxialColumnAttention, which reduces the memory pressure by sequentializing dot products and disabling caching of intermediates.