I am using the gencode.v19.annotation.gtf (head of the GTF below) to assign gene_types to the transcripts in my study via ensembl gene IDs. And example line from the GTF is also below.
Some gene_types have the name processed_transcript while other are lincRNA or antisense etc.
Ensembl just list the processed_transcript "biotype" under long non-coding transcript. That makes sense given I understand a processed transcript are those that do not have an ORF http://uswest.ensembl.org/Help/Faq?id=468
But what is unclear to me is what is the difference between a processed_transcript and these other long non-coding transcripts? According to Vega http://vega.sanger.ac.uk/info/about/gene_and_transcript_types.html, processed_transcript is above these other long ncRNAs in a hierarchy, which makes sense except I see many transcripts with this annotation and not on of the subtypes like lincRNA. Why would that be?
Based on what Genecode has written about biotypes https://www.gencodegenes.org/gencode_biotypes.html, I guess something would be processed_transcript if it has no ORF and does not meat the criteria for other catagories like lincRNA or antisense. Does anyone know if this is true?
##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74) ##provider: GENCODE ##contact: firstname.lastname@example.org ##format: gtf ##date: 2013-12-05
chr1 HAVANA gene 11869 14412 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";