Question: How to tell which transcript is the canonical transcript?
4
gravatar for steve
2.1 years ago by
steve1.7k
United States
steve1.7k wrote:

For example, if I have a list of variants like this:

Gene_ID Transcript  Coding  Amino_Acid_Change
TP53    NM_000546   c.G830T p.C277F
TP53    NM_001126112    c.G830T p.C277F
TP53    NM_001126113    c.G830T p.C277F
TP53    NM_001126114    c.G830T p.C277F
TP53    NM_001126115    c.G434T p.C145F
TP53    NM_001126116    c.G434T p.C145F
TP53    NM_001126117    c.G434T p.C145F
TP53    NM_001126118    c.G713T p.C238F

How could you figure out which of the transcripts is the canonical transcript?

Supposedly, transcripts are listed in places like UCSC, RefSeq, and Ensembl. But I have gone through each of these and have not been able to find anything that resembles the information I've listed above (ANNOVAR RefGene annotation output). The closest I've come is the UCSC Table Browser returning 'knownCanonical' for UCSC genes, but this is in a BED-style output with identifiers that do not resemble my given data. ANNOVAR's own documentation says that it does not support any differential reporting for canonical transcripts.

annovar • 3.5k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by steve1.7k
3
gravatar for igor
2.1 years ago by
igor6.5k
United States
igor6.5k wrote:

Those are RefSeq IDs, so you are looking for RefSeq info. There is a whole discussion about it here: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/_6asF5KciPc

If you want to know what is really the "canonical" transcript, that's a whole different story. Canonical is not always canonical.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by igor6.5k
1

I'm not really sure that there is even such as thing as "canonical" in real life. This is presumably why ANNOVAR doesn't support such a distinction.

ADD REPLYlink written 2.1 years ago by i.sudbery2.5k

Agreed. I've seen different sources disagree on what is canonical even for well-known genes.

ADD REPLYlink written 2.1 years ago by igor6.5k
2
gravatar for microfuge
2.1 years ago by
microfuge930
microfuge930 wrote:

I presume canonical information can be downloaded from here http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz . But I have not used it. For the species I use lacking such information, I usually determine the longest protein and use it as canonical.
I don't know why biomart does not provide that kind of important information. Also sorry i accidentally added it as an answer when I meant a comment.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by microfuge930

yes that is the same information I got from UCSC previously, however its records are in this format:

chr19 58310448 58326933 49943 uc284pmy.1 ENSG00000283103.1

I am not sure how to rectify this with the format I have from ANNOVAR RefGene output

ADD REPLYlink written 2.1 years ago by steve1.7k

This thread uses table browser to get related ids for all canonical transcripts. http://redmine.soe.ucsc.edu/forum/index.php?t=tree&th=7602&mid=19939&S=f6391396b0b0a7bcd539e058e8edc96b&rev=&reveal=

ADD REPLYlink written 2.1 years ago by microfuge930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour