SnpEff not correctly annotating multiple-nucleotide polymorphisms
0
0
Entering edit mode
3.1 years ago
Thoth • 0

I have some SARS-CoV-2 sequencing data that I'm trying to annotate with SnpEff, however SnpEff doesn't appear to be recognizing multiple adjacent SNPs within the same codon and correctly calling them as a single MNP (multiple-nucleotide polymorphism).

I'm using the GenBank reference MN908947.3. Here are the relevant lines of the annotated VCF file

MN908947.3  28280   .   G   C   2911.06 PASS    AC=2;AF=1.00;AN=2;DP=65;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.56;SOR=8.380;ANN=C|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.7G>C|p.Asp3His|7/1260|7/1260|3/419||,C|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1278G>C|||||1278|,C|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2896G>C|||||2896|,C|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2060G>C|||||2060|,C|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1808G>C|||||1808|,C|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1089G>C|||||1089|,C|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*893G>C|||||893|,C|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*521G>C|||||521|,C|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*21G>C|||||21| GT:AD:DP:GQ:PL  1/1:0,65:65:99:2925,196,0
MN908947.3  28281   .   A   T   2912.06 PASS    AC=2;AF=1.00;AN=2;DP=78;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.91;SOR=8.380;ANN=T|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.8A>T|p.Asp3Val|8/1260|8/1260|3/419||,T|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1277A>T|||||1277|,T|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2897A>T|||||2897|,T|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2061A>T|||||2061|,T|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1809A>T|||||1809|,T|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1090A>T|||||1090|,T|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*894A>T|||||894|,T|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*522A>T|||||522|,T|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*22A>T|||||22| GT:AD:DP:GQ:PL  1/1:0,65:65:99:2926,196,0
MN908947.3  28282   .   T   A   2912.06 PASS    AC=2;AF=1.00;AN=2;DP=78;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.71;SOR=8.380;ANN=A|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.9T>A|p.Asp3Glu|9/1260|9/1260|3/419||,A|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1276T>A|||||1276|,A|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2898T>A|||||2898|,A|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2062T>A|||||2062|,A|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1810T>A|||||1810|,A|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1091T>A|||||1091|,A|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*895T>A|||||895|,A|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*523T>A|||||523|,A|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*23T>A|||||23| GT:AD:DP:GQ:PL  1/1:0,65:65:99:2926,196,0

These are the positions corresponding to codon 3 on the N gene, they should be annotated as a single MNP, p.Asp3Leu, but istead they've been annotated seperately, as three individual SNPs.

I'm using the latest version of SnpEff (v5), and according to the manual SnpEff should have this functionality, what am I doing wrong?

annotation snpEff • 1.3k views
ADD COMMENT
0
Entering edit mode
<strike>it's not a problem with SNPEFF. SnpeFF does its job here. It's more a problem with the tool that generated the original VCF</strike>
ADD REPLY
0
Entering edit mode

are you sure? The example given in the tutorial here makes it look like SnpEff is the tool that identifies it. After all the alignment and variant caller only has access to the ref. sequence, it's SnpEff which has access to information on the ORFs, etc.

ADD REPLY
0
Entering edit mode

ah yes you're right ! That is new to me.

ADD REPLY
0
Entering edit mode

I wonder if you need to somehow indicate to SnpEff that adjacent SNPs are in cis, in order to get it to call them as a single MNP.

ADD REPLY

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6