Freebayes and snpeff give different number of SNPs
3
1
Entering edit mode
7.8 years ago
merodev ▴ 140

Hello friends,

I called SNPs using Freebayes and also used snpeff to see the effects on the output file from Freebayes. The number of SNPs for the same .vcf file is different in these two programs. Is there any suitable explanation for this? Number of SNPs reported by snpeff on the Freebayes output file is higher than the number of SNPs counted in Freebayes.

SNP snpeff freebayes • 3.6k views
3
Entering edit mode
7.8 years ago
Laura ★ 1.8k

Is SNPeff reporting multiple consequences for the same site on different lines so what you are seeing is duplication rather than new sites?

0
Entering edit mode

Almost certainly the issue here, same with VEP and any other effect prediction tool that outputs multiple transcripts effects for a single variant.

2
Entering edit mode
7.8 years ago
Pablo ★ 1.9k

If a VCF entry has two (or more) alts, SnpEff counts it as two (or more) SNPs.

For instance

CHR:1, POS:1234, REF:A, ALT:C


is 1 SNP (A>C) , whereas this entry

CHR:1, POS:1234, REF:A, ALT:C,G


is counted as 2 SNPs (A>C and A>G).

I don't know how freebayes counts, but I'm assuming it's might only counting number of VCF entries (which SnpEff also reports in the HTML summary).

0
Entering edit mode

Thank you Pablo. This was exactly what I was looking for!

0
Entering edit mode
7.8 years ago
Dan D 7.3k

Variance amongst variant caller results is a1 well2-known3 issue4 (notice I didn't use the word "problem") in analysis of high-throughput sequencing data.

In simple terms, the reason for the discrepancy is the overall different approaches the various variant callers take in the process of filtering and preparing the files for SNP calling. There is a selection of algorithms to choose from for each step in the variant-calling process, in addition to the spectrum thresholds for quality, depth, and other metrics.

Given this, you should first ask yourself what you think defines an interesting or valid variant in the context of your experimental setup. Make sure you can explain, in basic language, why the variant caller you're using selects the variants it does. If you do this, you can be more confident in the variants you do see, while also being less paranoid about false negatives.

2
Entering edit mode

SnpEff is a tool to annotate an existing VCF file, not a different variant caller.

0
Entering edit mode

Great answer, but for the wrong question :)

0
Entering edit mode

Thank you for the comments. What I am trying to figure out is how SnpEff is summarizing snp counts in its output html. While grepping "TYPE=snp" or "snp" alone gives a lower count as compared to the summary results of snpeff.