How many INDELs and SNPs should I expect in vcf file and what the best parameters to filter false positives?
I am analyzing human samples (WGS and WES) and the total number of variants I get after variant calling are 6 millions (WGS, over 1 million WES). I think the number is too high and there are many false positives. I am using Illumina Dragen Bio-IT plaftorm (which has GATK best practices implemented) for variant calling and I tested it on Coriell NA12878 sample the accuracy is 99.97 for SNPs and 97% for INDELs. Primary QC metrics looks fine and samples are normalized to the sampe coverage 35 for WGS and 100 for WES.

Sounds about right. A diploid human genome is 6 Gb, on average we have one variant every 1 kb, so 6 million variants per sample is to be expected.


