Question

Snpeff Annotation

1

Entering edit mode

12.2 years ago

User 1666 ▴ 20

Hello,

I'm trying to annotate a set of SNPs with snpeff and for some SNPs I'm getting annotations like this one:

chr11   117163824       rs638405        C       G       EFF=NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gat/Cat|D139H|BACE1|mRNA|CODING|NM_001207049|NM_001207049.ex.5),
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gat/Cat|D164H|BACE1|mRNA|CODING|NM_001207048|NM_001207048.ex.5),
STOP_LOST(HIGH|MISSENSE|tGa/tCa|*195S|BACE1|mRNA|CODING|NM_138973|NM_138973.ex.5),
STOP_LOST(HIGH|MISSENSE|tGa/tCa|*220S|BACE1|mRNA|CODING|NM_138971|NM_138971.ex.5),
STOP_LOST(HIGH|MISSENSE|tGa/tCa|*239S|BACE1|mRNA|CODING|NM_138972|NM_138972.ex.5),
STOP_LOST(HIGH|MISSENSE|tGa/tCa|*264S|BACE1|mRNA|CODING|NM_012104|NM_012104.ex.5)

The first two effects (non synonymous coding) and the last four (stop lost) seem to refer to two different read frames (1 base shift with respect to each other). Is it possible/correct ? Can it be that two different transcripts have 1 base read frame shift?

Thanks in advance, Andrei Barysenka

• 4.7k views

ADD COMMENT • link updated 12.2 years ago by Pablo ★ 1.9k • written 12.2 years ago by User 1666 ▴ 20

score 2 · Answer 1 · 2012-02-07

There was a problem with one of the releases of the hg19 databases, but I fixed it shortly after the release. If you downloaded the hg19 database during that period of time, chances are you have a corrupt database.

Just download the latest database version:

$ java -jar snpEff.jar download hg19
00:00:00.000    Downloading database for 'hg19'
00:00:00.003    Connecting to http://downloads.sourceforge.net/project/snpeff/databases//v2_0_5/snpEff_v2_0_5_hg19.zip
00:00:11.168    Copying file (type: application/octet-stream, modified on: Thu Jan 19 21:13:03 EST 2012)
00:00:11.169    Local file name: 'snpEff_v2_0_5_hg19.zip'
...
00:00:40.209    Unzip: OK
00:00:40.209    Done

here is what I get:

$ java -Xmx10G -jar snpEff.jar eff -v -i txt hg19 ~/snpEff/test.txt -o txt | tee test.out.txt
...
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_001207049    NM_001207049.ex.5   4   SYNONYMOUS_CODING   V/V gtG/gtC 137 31131           
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_001207048    NM_001207048.ex.5   4   SYNONYMOUS_CODING   V/V gtG/gtC 162 31206           
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_138971   NM_138971.ex.5  5   SYNONYMOUS_CODING   V/V gtG/gtC 218 3   1374            
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_138973   NM_138973.ex.5  5   SYNONYMOUS_CODING   V/V gtG/gtC 193 3   1299            
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_138972   NM_138972.ex.5  5   SYNONYMOUS_CODING   V/V gtG/gtC 237 3   1431            
11  117163824   C   G   SNP Hom             BACE1.11    BACE1   mRNA    NM_012104   NM_012104.ex.5  5   SYNONYMOUS_CODING   V/V gtG/gtC 262 3   1506

score 1 · Answer 2 · 2012-02-07

What is particularly perplexing about this case is that even the (presumably ok) non-synonymous change isn't in the correct frame. This SNP (rs638405) affects the third codon position of a (reverse-strand) GTG Valine residue of BACE1, and a C->G variant should be a synonymous change. I'm guessing there is some issue with base-zero or base-one incompatibility of annotations at some point in the pipeline you're using.

More generically, such discordant annotations most frequently arise from the existence of annotated transcripts of either lesser quality, or representing rare transcript forms. E.g. if an mRNA was mis-sequenced introducing a frameshift, there may be some misannotation that presents an exon in the wrong frame. Or, there may be rare transcripts retaining (part of) an intron or using some alternative splice site, leading to a frameshift through part of the gene.

Remember that when doing this kind of analyses, the results are lists of hypotheses. :)