Question: Does ENSEMBL have a descriptor for 'full length' transcript
gravatar for george.wiggins
3.4 years ago by
New Zealand
george.wiggins10 wrote:

Simply all I want to know, is there an identifier in ensembl gtfs that indicator which transcript is the 'full-length' transcript. My assumption, is to use the transcript_name provided (i.e. APC-001) where exact transcript in number. I assume the 001 would be the full length transcript, however I need to be sure before I proceed.

If it is not, does anyone have an suggestion how to identify the consensus FL-transcript for the whole transcriptome

transcripts ensembl gtf • 1.2k views
ADD COMMENTlink modified 3.4 years ago by Jean-Karim Heriche18k • written 3.4 years ago by george.wiggins10

Please do not send identical messages to BioStars and Ensembl helpdesk. It is a waste of effort if we are trying to respond and the people on BioStars are as well. I will delete the Ensembl helpdesk ticket as Jean-Karim has already answered your question.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Emily_Ensembl17k

Thank you for your quick reply. I wasn't aware that Ensembl was so good as keeping up with questions on biostar. I was assuming I would get a community response here and a more official response from the helpdesk. Nevertheless, I won't repeat questions to the helpdesk in the future. 

ADD REPLYlink written 3.4 years ago by george.wiggins10
gravatar for Jean-Karim Heriche
3.4 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche18k wrote:

There can be several full-length transcripts for a given gene. Depending on your problem, you may be able to use CCDS transcripts or if you need only one transcript per gene, you'll have to come up with some rule(s) to decide which one to pick among the different transcripts associated with a gene.

Edit: Just remembered that EnsEMBL defines a canonical transcript for a gene which can be retrieved with the API:

my $gene = $gene_adaptor->fetch_by_stable_id($gene_id);

my $canonical_transcript = $gene->canonical_transcript();
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Jean-Karim Heriche18k

Thanks for you reply. I am aware that there are numerous FL transcripts, but I was naively hoping that there maybe a system which people use an single 'agreed' FL transcript.

Maybe if I explain my problem, then you might be able to help point me in the right direction? I have targeted-RNASeq data, the assays were designed to overlap junctions (only) of genes. What I am trying to do is annotate a junction count file with exon numbering. To do this I need to select one transcript to be the FL (this can be relatively arbitrary but would be good to have a logical system) and number exons relative to this transcript.

I can't use transcript assemble tools (cufflinks etc) as I am missing to much exonic data. This will have to be an abundance of junctions analysis (might be able to tease more out later).

I have compressed my GTF to only have one exon or UTR (based on start stop positions) represented with a list of transcripts that overlap the exact coordinates. Now I need to match these coordinate to my junction file (easy enough) and name the junction exon x-y.

ADD REPLYlink written 3.4 years ago by george.wiggins10

If possible, I'd go with the canonical transcripts. In the past, when I needed on representative transcript per gene, I used the one producing the longest protein or failing that, simply the longest transcript. However, in your case, I think it might be preferable to concatenate all the exons of a gene so that you don't miss alternatively spliced exons that may not be present in the selected representative transcript.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Jean-Karim Heriche18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour