How many INDELs and SNPs should I expect in vcf file and what the best parameters to filter false positives?
1
0
Entering edit mode
11 weeks ago
esimonova.me ▴ 10

I am analyzing human samples (WGS and WES) and the total number of variants I get after variant calling are 6 millions (WGS, over 1 million WES). I think the number is too high and there are many false positives. I am using Illumina Dragen Bio-IT plaftorm (which has GATK best practices implemented) for variant calling and I tested it on Coriell NA12878 sample the accuracy is 99.97 for SNPs and 97% for INDELs. Primary QC metrics looks fine and samples are normalized to the sampe coverage 35 for WGS and 100 for WES.

variants WES WGS • 286 views
ADD COMMENT
0
Entering edit mode
11 weeks ago

Sounds about right. A diploid human genome is 6 Gb, on average we have one variant every 1 kb, so 6 million variants per sample is to be expected.

ADD COMMENT

Login before adding your answer.

Traffic: 2019 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6