Export first review
Review from @christian.busse:
= Repertoire =
study_contact
is not exported from DB
-
1) this happens because the importer loads a blank repertoire using
import airr
airr.repertoire_template()
the template from the python package for some reason doesn't include yet the changes from the main branch of the standards repository. I would look into that.
age_event
should be included in add_donor_info withee so value "enrollment"
-
2) I have added this to the db.
collection_time_point_relative
lacks the unit (in the JSON, not in the DB)
-
3) Same reason as in 1). I could directly use the template from the airr repository
collection_time_point_reference
should be3rd_immunization
(the "post_" is implied bycollection_time_point_relative
being positive.
-
4) Update: see malaria_mavache_t13
. Do we also remove all pre_ and post_ prefixes?collection_time_point_relative
is not defined for any of the malaria_mavache databases.
cell_subset
andcell_phenotype
are NULL,sort
.population
in the DB contains labels, not CURIEs
-
5) I'm looking into that one. Update: cell_phenotype
hasn't been solved yet. I mapped it to anadd_sample_info
keyword but this has never been used, do we have an example of what should go into this field?.
sequencing_platform
should be "Illumina MiSeq",sequencing_facility
is "Eurofins Genomics"
-
6) changed and added sequencing_platform
to the DB
- content of
locus
should be TRA or TRB (not TCRA/TCRB)
-
7) Noted and I have already changed it
- Why are
sequence_alignment
,germline_alignment
,v_cigar
,d_cigar
andj_cigar
"nan" (assuming that they are strings)
-
8) I will change this too. I needed to fill blank fields with float nans to remove some trailing zeroes but didn't check that the strings were empty strings and not nan, thanks for pointing this out.
= Receptor =
The receptor object maps an AA string to a chaintype. The current version of the
Receptor
object (receptor-schema
branch) is too strictly focused on Igs, so you can use this one:receptor_id #our internal id
receptor_gid #currently NULL, will contain hash of AA sequence
receptor_chains: [{chain:xx,sequence_aa:xx}]
receptor_type # here: "TCRB:TCRA"
confirmed_positions: [1,len(sequence_aa)]
I will have a look at the #409 stuff soon.
-
TBD
= Expression data =
- Looks good. Will need to discuss the details on how to link these objects in today's Standards call.
-
TBD