Where did 'antisense' biotype go when transitioning from mm9 to mm10 genome build in encode?
3.0 years ago

I have been comparing GTF annotations for mm9 and mm10 genome builds from ensembl and noticed that several biotypes are unique for mm9 and mm10 gtf files.

transcript_biotype for mm10 (but not for mm9): IG_C_pseudogene, IG_D_pseudogene, IG_LV_gene, IG_pseudogene, IG_V_pseudogene, lncRNA, non_stop_decay, nonsense_mediated_decay, processed_pseudogene, retained_intron, ribozyme, scaRNA, scRNA, sRNA, TEC, TR_C_gene, TR_D_gene, TR_J_gene, TR_J_pseudogene, TR_V_gene, TR_V_pseudogene, transcribed_processed_pseudogene, transcribed_unitary_pseudogene, transcribed_unprocessed_pseudogene, translated_processed_pseudogene, translated_unprocessed_pseudogene, unitary_pseudogene, unprocessed_pseudogene

Biotype for mm9 (but not for mm10): 3prime_overlapping_ncrna, antisense, lincRNA, ncrna_host, non_coding, processed_transcript, sense_intronic, sense_overlapping

So my question is: What happened to the biotypes that went missing? For example, what happened to the "antisense" biotype in mm9? By looking at few selected transcripts belonging to "antisense" biotype in mm9 (such as Nespas-003, Gm16119-002, 1300015D01Rik-003, C130080G10Rik-003), I can not find these transcript names in mm10 anymore.

mm9 gtf file was generated like this:

wget ftp://ftp.ensembl.org/pub/release-67/gtf//mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz

zless Mus_musculus.NCBIM37.67.gtf.gz | grep -v "NT_" | perl -ane 'print "chr$_";' > mm9_ensembl.gtf

mm10 gtf file was generated like this:

wget ftp://ftp.ensembl.org/pub//release-97/gtf/mus_musculus/Mus_musculus.GRCm38.97.chr.gtf.gz

zless Mus_musculus.GRCm38.97.chr.gtf.gz | grep -v "^#" | perl -ane 'print "chr$_";' > mm10_ensembl.gtf
