Question: Freebayes and snpeff give different number of SNPs
gravatar for merodev
6.2 years ago by
United States
merodev140 wrote:

Hello friends,

I called SNPs using Freebayes and also used snpeff to see the effects on the output file from Freebayes. The number of SNPs for the same .vcf file is different in these two programs. Is there any suitable explanation for this? Number of SNPs reported by snpeff on the Freebayes output file is higher than the number of SNPs counted in Freebayes.

Thanks for your help

snp freebayes snpeff • 2.7k views
ADD COMMENTlink modified 6.2 years ago by Pablo1.9k • written 6.2 years ago by merodev140
gravatar for Laura
6.2 years ago by
Cambridge UK
Laura1.7k wrote:

Is SNPeff reporting multiple consequences for the same site on different lines so what you are seeing is duplication rather than new sites?

ADD COMMENTlink written 6.2 years ago by Laura1.7k

Almost certainly the issue here, same with VEP and any other effect prediction tool that outputs multiple transcripts effects for a single variant.

ADD REPLYlink written 6.2 years ago by User 5913k
gravatar for Pablo
6.2 years ago by
Pablo1.9k wrote:

If a VCF entry has two (or more) alts, SnpEff counts it as two (or more) SNPs. 

For instance

    CHR:1, POS:1234, REF:A, ALT:C

is 1 SNP (A>C) , whereas this entry

    CHR:1, POS:1234, REF:A, ALT:C,G

is counted as 2 SNPs (A>C and A>G). 

I don't know how freebayes counts, but I'm assuming it's might only counting

number of VCF entries (which SnpEff also reports in the HTML summary).




ADD COMMENTlink written 6.2 years ago by Pablo1.9k

Thank you Pablo. This was exactly what I was looking for!

ADD REPLYlink written 6.1 years ago by merodev140
gravatar for Dan D
6.2 years ago by
Dan D7.2k
Dan D7.2k wrote:

Variance amongst variant caller results is a1 well2-known3 issue4 (notice I didn't use the word "problem") in analysis of high-throughput sequencing data.

In simple terms, the reason for the discrepancy is the overall different approaches the various variant callers take in the process of filtering and preparing the files for SNP calling. There is a selection of algorithms to choose from for each step in the variant-calling process, in addition to the spectrum thresholds for quality, depth, and other metrics.

Given this, you should first ask yourself what you think defines an interesting or valid variant in the context of your experimental setup. Make sure you can explain, in basic language, why the variant caller you're using selects the variants it does. If you do this, you can be more confident in the variants you do see, while also being less paranoid about false negatives.

ADD COMMENTlink modified 16 months ago by Ram32k • written 6.2 years ago by Dan D7.2k

SnpEff is a tool to annotate an existing VCF file, not a different variant caller.

ADD REPLYlink written 6.2 years ago by Fedor Gusev210

Great answer, but for the wrong question :)

ADD REPLYlink written 6.2 years ago by User 5913k

Thank you for the comments. What I am trying to figure out is how SnpEff is summarizing snp counts in its output html. While grepping "TYPE=snp" or "snp" alone gives a lower count as compared to the summary results of snpeff. 

ADD REPLYlink written 6.2 years ago by merodev140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2227 users visited in the last hour