Question: How to tell which transcript is the canonical transcript?
4
gravatar for steve
21 months ago by
steve1.5k
United States
steve1.5k wrote:

For example, if I have a list of variants like this:

Gene_ID Transcript  Coding  Amino_Acid_Change
TP53    NM_000546   c.G830T p.C277F
TP53    NM_001126112    c.G830T p.C277F
TP53    NM_001126113    c.G830T p.C277F
TP53    NM_001126114    c.G830T p.C277F
TP53    NM_001126115    c.G434T p.C145F
TP53    NM_001126116    c.G434T p.C145F
TP53    NM_001126117    c.G434T p.C145F
TP53    NM_001126118    c.G713T p.C238F

How could you figure out which of the transcripts is the canonical transcript?

Supposedly, transcripts are listed in places like UCSC, RefSeq, and Ensembl. But I have gone through each of these and have not been able to find anything that resembles the information I've listed above (ANNOVAR RefGene annotation output). The closest I've come is the UCSC Table Browser returning 'knownCanonical' for UCSC genes, but this is in a BED-style output with identifiers that do not resemble my given data. ANNOVAR's own documentation says that it does not support any differential reporting for canonical transcripts.

annovar • 2.9k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by steve1.5k
3
gravatar for igor
21 months ago by
igor6.2k
United States
igor6.2k wrote:

Those are RefSeq IDs, so you are looking for RefSeq info. There is a whole discussion about it here: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/_6asF5KciPc

If you want to know what is really the "canonical" transcript, that's a whole different story. Canonical is not always canonical.

ADD COMMENTlink modified 21 months ago • written 21 months ago by igor6.2k
1

I'm not really sure that there is even such as thing as "canonical" in real life. This is presumably why ANNOVAR doesn't support such a distinction.

ADD REPLYlink written 21 months ago by i.sudbery2.3k

Agreed. I've seen different sources disagree on what is canonical even for well-known genes.

ADD REPLYlink written 21 months ago by igor6.2k
2
gravatar for microfuge
21 months ago by
microfuge900
microfuge900 wrote:

I presume canonical information can be downloaded from here http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownCanonical.txt.gz . But I have not used it. For the species I use lacking such information, I usually determine the longest protein and use it as canonical.
I don't know why biomart does not provide that kind of important information. Also sorry i accidentally added it as an answer when I meant a comment.

ADD COMMENTlink modified 21 months ago • written 21 months ago by microfuge900

yes that is the same information I got from UCSC previously, however its records are in this format:

chr19 58310448 58326933 49943 uc284pmy.1 ENSG00000283103.1

I am not sure how to rectify this with the format I have from ANNOVAR RefGene output

ADD REPLYlink written 21 months ago by steve1.5k

This thread uses table browser to get related ids for all canonical transcripts. http://redmine.soe.ucsc.edu/forum/index.php?t=tree&th=7602&mid=19939&S=f6391396b0b0a7bcd539e058e8edc96b&rev=&reveal=

ADD REPLYlink written 21 months ago by microfuge900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1983 users visited in the last hour