Question

Only the longest transcript while annotating with Annovar?

1

Entering edit mode

6.2 years ago

mhmtgenc85 ▴ 50

Hi everyone, I was wondering if I can get only the longest trancript in the corresponding column of the Annovar annotation? -onetranscript argument chooses randomly but not the longest one.

So could you please help me? Thanks in advance

Annovar annotation SNP transcript refGene • 2.2k views

ADD COMMENT • link updated 6.2 years ago by Kevin Blighe 87k • written 6.2 years ago by mhmtgenc85 ▴ 50

score 5 · Accepted Answer · 2018-02-15

The answer from the author of ANNOVAR is this:

There has never been a consensus in the field which transcript should be used to represent a gene when multiple transcripts are available. The most popular approach is to use the longest transcript nowadays. However, in the medical genetics field, for certain specific diseases and specific genes, there are 'canonical' transcripts that everybody uses by default for historical reasons, and you will need to manually select this canonical transcript from ANNOVAR output file to communicate with the rest of the field.

[source: http://annovar.openbioinformatics.org/en/latest/misc/faq/]

In a way, he is correct, and I feel that the field should start to embrace (and report) multiple transcript isoforms more and more, even with the increased data load. There is too much reporting of variants on isoforms that may have minimal relevance in the tissue of study. Also, for many well-studied genes, like BRCA1, we have identified >10 isoforms; whilst, for other less-studies genes, we don't yet understand the alternate splicing patterns of the gene.

Note that VEP does allow you to output the canonical isoform, but to Ensembl the canonical is always the isoform with the longest CCDS: https://www.ensembl.org/Help/Glossary?id=346

On the last point, researchers even disagree about what canonical means. For some it is the highest expressed isoform in the tissue being studied, which may not necessarily be the longest. At least Ensembl's definition is broad-sweeping and covers all tissues.

Kevin