How many INDELs and SNPs should I expect in vcf file and what the best parameters to filter false positives?
Entering edit mode
2.1 years ago ▴ 20

I am analyzing human samples (WGS and WES) and the total number of variants I get after variant calling are 6 millions (WGS, over 1 million WES). I think the number is too high and there are many false positives. I am using Illumina Dragen Bio-IT plaftorm (which has GATK best practices implemented) for variant calling and I tested it on Coriell NA12878 sample the accuracy is 99.97 for SNPs and 97% for INDELs. Primary QC metrics looks fine and samples are normalized to the sampe coverage 35 for WGS and 100 for WES.

variants WES WGS • 624 views
Entering edit mode
2.1 years ago
Emily 23k

Sounds about right. A diploid human genome is 6 Gb, on average we have one variant every 1 kb, so 6 million variants per sample is to be expected.


Login before adding your answer.

Traffic: 1281 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6