Freebayes and snpeff give different number of SNPs
3
1
Entering edit mode
6.8 years ago
merodev ▴ 140

Hello friends,

I called SNPs using Freebayes and also used snpeff to see the effects on the output file from Freebayes. The number of SNPs for the same .vcf file is different in these two programs. Is there any suitable explanation for this? Number of SNPs reported by snpeff on the Freebayes output file is higher than the number of SNPs counted in Freebayes.

Thanks for your help

SNP snpeff freebayes • 3.0k views
ADD COMMENT
3
Entering edit mode
6.7 years ago
Laura ★ 1.7k

Is SNPeff reporting multiple consequences for the same site on different lines so what you are seeing is duplication rather than new sites?

ADD COMMENT
0
Entering edit mode

Almost certainly the issue here, same with VEP and any other effect prediction tool that outputs multiple transcripts effects for a single variant.

ADD REPLY
2
Entering edit mode
6.7 years ago
Pablo ★ 1.9k

If a VCF entry has two (or more) alts, SnpEff counts it as two (or more) SNPs. 

For instance

    CHR:1, POS:1234, REF:A, ALT:C

is 1 SNP (A>C) , whereas this entry

    CHR:1, POS:1234, REF:A, ALT:C,G

is counted as 2 SNPs (A>C and A>G). 

I don't know how freebayes counts, but I'm assuming it's might only counting

number of VCF entries (which SnpEff also reports in the HTML summary).

 

 

 

ADD COMMENT
0
Entering edit mode

Thank you Pablo. This was exactly what I was looking for!

ADD REPLY
0
Entering edit mode
6.8 years ago
Dan D 7.2k

Variance amongst variant caller results is a1 well2-known3 issue4 (notice I didn't use the word "problem") in analysis of high-throughput sequencing data.

In simple terms, the reason for the discrepancy is the overall different approaches the various variant callers take in the process of filtering and preparing the files for SNP calling. There is a selection of algorithms to choose from for each step in the variant-calling process, in addition to the spectrum thresholds for quality, depth, and other metrics.

Given this, you should first ask yourself what you think defines an interesting or valid variant in the context of your experimental setup. Make sure you can explain, in basic language, why the variant caller you're using selects the variants it does. If you do this, you can be more confident in the variants you do see, while also being less paranoid about false negatives.

ADD COMMENT
2
Entering edit mode

SnpEff is a tool to annotate an existing VCF file, not a different variant caller.

ADD REPLY
0
Entering edit mode

Great answer, but for the wrong question :)

ADD REPLY
0
Entering edit mode

Thank you for the comments. What I am trying to figure out is how SnpEff is summarizing snp counts in its output html. While grepping "TYPE=snp" or "snp" alone gives a lower count as compared to the summary results of snpeff. 

ADD REPLY

Login before adding your answer.

Traffic: 2086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6