I have some SARS-CoV-2 sequencing data that I'm trying to annotate with SnpEff, however SnpEff doesn't appear to be recognizing multiple adjacent SNPs within the same codon and correctly calling them as a single MNP (multiple-nucleotide polymorphism).
I'm using the GenBank reference MN908947.3. Here are the relevant lines of the annotated VCF file
MN908947.3 28280 . G C 2911.06 PASS AC=2;AF=1.00;AN=2;DP=65;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.56;SOR=8.380;ANN=C|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.7G>C|p.Asp3His|7/1260|7/1260|3/419||,C|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1278G>C|||||1278|,C|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2896G>C|||||2896|,C|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2060G>C|||||2060|,C|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1808G>C|||||1808|,C|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1089G>C|||||1089|,C|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*893G>C|||||893|,C|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*521G>C|||||521|,C|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*21G>C|||||21| GT:AD:DP:GQ:PL 1/1:0,65:65:99:2925,196,0
MN908947.3 28281 . A T 2912.06 PASS AC=2;AF=1.00;AN=2;DP=78;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=26.91;SOR=8.380;ANN=T|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.8A>T|p.Asp3Val|8/1260|8/1260|3/419||,T|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1277A>T|||||1277|,T|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2897A>T|||||2897|,T|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2061A>T|||||2061|,T|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1809A>T|||||1809|,T|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1090A>T|||||1090|,T|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*894A>T|||||894|,T|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*522A>T|||||522|,T|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*22A>T|||||22| GT:AD:DP:GQ:PL 1/1:0,65:65:99:2926,196,0
MN908947.3 28282 . T A 2912.06 PASS AC=2;AF=1.00;AN=2;DP=78;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.71;SOR=8.380;ANN=A|missense_variant|MODERATE|N|Gene_28273_29532|transcript|QHD43423.2|protein_coding|1/1|c.9T>A|p.Asp3Glu|9/1260|9/1260|3/419||,A|upstream_gene_variant|MODIFIER|ORF10|Gene_29557_29673|transcript|QHI42199.1|protein_coding||c.-1276T>A|||||1276|,A|downstream_gene_variant|MODIFIER|S|Gene_21562_25383|transcript|QHD43416.1|protein_coding||c.*2898T>A|||||2898|,A|downstream_gene_variant|MODIFIER|ORF3a|Gene_25392_26219|transcript|QHD43417.1|protein_coding||c.*2062T>A|||||2062|,A|downstream_gene_variant|MODIFIER|E|Gene_26244_26471|transcript|QHD43418.1|protein_coding||c.*1810T>A|||||1810|,A|downstream_gene_variant|MODIFIER|M|Gene_26522_27190|transcript|QHD43419.1|protein_coding||c.*1091T>A|||||1091|,A|downstream_gene_variant|MODIFIER|ORF6|Gene_27201_27386|transcript|QHD43420.1|protein_coding||c.*895T>A|||||895|,A|downstream_gene_variant|MODIFIER|ORF7a|Gene_27393_27758|transcript|QHD43421.1|protein_coding||c.*523T>A|||||523|,A|downstream_gene_variant|MODIFIER|ORF8|Gene_27893_28258|transcript|QHD43422.1|protein_coding||c.*23T>A|||||23| GT:AD:DP:GQ:PL 1/1:0,65:65:99:2926,196,0
These are the positions corresponding to codon 3 on the N gene, they should be annotated as a single MNP, p.Asp3Leu, but istead they've been annotated seperately, as three individual SNPs.
I'm using the latest version of SnpEff (v5), and according to the manual SnpEff should have this functionality, what am I doing wrong?
are you sure? The example given in the tutorial here makes it look like SnpEff is the tool that identifies it. After all the alignment and variant caller only has access to the ref. sequence, it's SnpEff which has access to information on the ORFs, etc.
ah yes you're right ! That is new to me.
I wonder if you need to somehow indicate to SnpEff that adjacent SNPs are in cis, in order to get it to call them as a single MNP.