I called SNPs using Freebayes and also used snpeff to see the effects on the output file from Freebayes. The number of SNPs for the same .vcf file is different in these two programs. Is there any suitable explanation for this? Number of SNPs reported by snpeff on the Freebayes output file is higher than the number of SNPs counted in Freebayes.
Variance amongst variant caller results is a1 well2-known3 issue4 (notice I didn't use the word "problem") in analysis of high-throughput sequencing data.
In simple terms, the reason for the discrepancy is the overall different approaches the various variant callers take in the process of filtering and preparing the files for SNP calling. There is a selection of algorithms to choose from for each step in the variant-calling process, in addition to the spectrum thresholds for quality, depth, and other metrics.
Given this, you should first ask yourself what you think defines an interesting or valid variant in the context of your experimental setup. Make sure you can explain, in basic language, why the variant caller you're using selects the variants it does. If you do this, you can be more confident in the variants you do see, while also being less paranoid about false negatives.
Thank you for the comments. What I am trying to figure out is how SnpEff is summarizing snp counts in its output html. While grepping "TYPE=snp" or "snp" alone gives a lower count as compared to the summary results of snpeff.
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.0 years ago by
merodev
▴
150
Almost certainly the issue here, same with VEP and any other effect prediction tool that outputs multiple transcripts effects for a single variant.