Upstream bootstrap upd
Created by: albazarova
The up-to-date branch for bootstrapping task.
Includes:
Bootstrapping task: replacing custom number of sequences (default 50% of the MSA) in the MSA by the bootstrapped ones.
- Loss per sequence
- Loss per base pair. Two options: comparing each bootstrapped sequence to the replaced sequence/comparing each bootstrapped sequence to the closest one in the MSA
- Mean absolute error loss: predicting the normalised hamming distance between the bootstrapped sequence and the replaced/closest one
Jigsaw task amended with the one where columns are frozen: all sequences in the MSA are shuffled in the same way
max_seqlen in the MSAmodel both in train.py and in train_downstream.py is hardwired to what it should be