I have been comparing GTF annotations for mm9 and mm10 genome builds from ensembl and noticed that several biotypes are unique for mm9 and mm10 gtf files.
transcript_biotype for mm10 (but not for mm9): IG_C_pseudogene, IG_D_pseudogene, IG_LV_gene, IG_pseudogene, IG_V_pseudogene, lncRNA, non_stop_decay, nonsense_mediated_decay, processed_pseudogene, retained_intron, ribozyme, scaRNA, scRNA, sRNA, TEC, TR_C_gene, TR_D_gene, TR_J_gene, TR_J_pseudogene, TR_V_gene, TR_V_pseudogene, transcribed_processed_pseudogene, transcribed_unitary_pseudogene, transcribed_unprocessed_pseudogene, translated_processed_pseudogene, translated_unprocessed_pseudogene, unitary_pseudogene, unprocessed_pseudogene
Biotype for mm9 (but not for mm10): 3prime_overlapping_ncrna, antisense, lincRNA, ncrna_host, non_coding, processed_transcript, sense_intronic, sense_overlapping
So my question is: What happened to the biotypes that went missing? For example, what happened to the "antisense" biotype in mm9? By looking at few selected transcripts belonging to "antisense" biotype in mm9 (such as Nespas-003, Gm16119-002, 1300015D01Rik-003, C130080G10Rik-003), I can not find these transcript names in mm10 anymore.
mm9 gtf file was generated like this:
wget ftp://ftp.ensembl.org/pub/release-67/gtf//mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz zless Mus_musculus.NCBIM37.67.gtf.gz | grep -v "NT_" | perl -ane 'print "chr$_";' > mm9_ensembl.gtf
mm10 gtf file was generated like this:
wget ftp://ftp.ensembl.org/pub//release-97/gtf/mus_musculus/Mus_musculus.GRCm38.97.chr.gtf.gz zless Mus_musculus.GRCm38.97.chr.gtf.gz | grep -v "^#" | perl -ane 'print "chr$_";' > mm10_ensembl.gtf