Question: Variant Annotation - Which Transcript(S) Are The Best Representatives Of The Variant?
9.6 years ago by
Bethesda, MD, USA
I would like to learn what are the rules one should use while annotating variants, specifically to select transcripts for variant annotations. The topic is discussed here and there are many tools we can use to annotate a variant from the starting info of chr,position,variant_allele but most of the downstream annotation depends on which transcript you would choose. Many genes have alternative transcripts and based on which transcript you choose determines if the variant is a coding variant or if it is an intronic variant etc. So how do you choose which transcript to represent your variant?

Also see the discussion here: How To Assess The Effect Of Snps Based On Multiple Transcripts ?

9.6 years ago by
Boston, MA USA
Before I get to your main question of transcript selection, I will briefly bring up haplotypes. It may be appropriate to analyze a haplotype rather than a single variant because of high linkage disequilibrium (LD) that allows certain variants to "travel" together. I find analysis of haplotypes to be extremely important when looking for allele-specific effects on RNA folding.

Let's say SNP 1 is A (major allele) and C (minor) while SNP 2 is G (major) and T (minor) and these are in very high LD. If SNPs 1 and 2 are in high (or absolute) LD, SNP 1 major allele of A and SNP 2 major allele of G will often (always) be found together, while SNP 1 allele A and SNP 2 allele of T (it minor allele) will be observed never or rarely. Thus, one should analyze two mRNA isoforms: one that is A at SNP 1 and G at SNP 2, and a second that is C at SNP 1 and T and SNP 2.

Which transcript to analyze can depend on many factors. Perhaps you should analyze the most well known or well characterized mRNA. Perhaps you should analyze all of the mRNA isoforms that are expressed in the tissue(s) relevant to your phenotype(s) of interest. In other cases, you may want to be blind to expression and phenotype and just analyze all reported mRNA isoforms for that gene.

Regardless of the above, I find folding of the 3'-UTR to be the most complicated and intensive of possible analyses one can perform in annotating variants. This is less the case for 5'-UTRs because they are usually much shorter than the 3'-UTR.

9.6 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
The answer to this question depends heavily on what you want to do downstream of the variant annotation. The most general way of proceeding is to simply annotate against ALL transcripts at a given locus and then make heuristic rules about which variant annotation to use downstream rather than arbitrarily choosing a transcript a priori.

I agree with Sean's assessment (+1), but would add that if the variant is linked to a brain phenotype (eg cognition or Parkinson's), then you are justified in choosing to annotate only brain-specific transcripts - provided those are known.

