GATK ApplyVQSR filtering doesnt work
6 months ago
nhaus ▴ 130

Hello,

I am using GATK 4.2 to perform germline calling. To reduce the number of false positives, I use the VariantRecalibration workflow with the recommended resources.

After using ApplyVQSR in the SNP mode, I notice that many SNPs have "PASS" in the VCF FILTER column, although they should have been filtered, according to the AS_FilterStatus.

Here are two such variant:

1 2424417 . T C 2265.06 PASS AC=2;AF=1.00;AN=2;AS\_BaseQRankSum=.;AS\_FS=0.000;AS\_FilterStatus=VQSRTrancheSNP97.00to98.00;AS\_MQ=60.00;AS\_MQRankSum=.;AS\_QD=33.31;AS\_ReadPosRankSum=.;AS\_SOR=1.352;AS\_VQSLOD=6.0834;AS\_culprit=MQ;DP=77;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.72;POSITIVE\_TRAIN\_SITE;QD=33.31;SOR=1.385 GT:AD:DP:GQ:PL 1/1:0,68:69:99:2279,204,0


For all steps I am using the "Allele specific" calling pipeline.

I read in the ApplyVQSR documentation that if one allele passes, the whole site will be PASS, however, as you see by my example, there only is a single allele which fails the quality control.

The specific command I am using:

gatk ApplyVQSR -V cohort.indel.recalibrated.vcf.gz --recal-file cohort\_snp.recal --tranches-file cohort\_snp.tranches --truth-sensitivity-filter-level 97 -mode SNP -AS -O cohort.recalibrated.vcf.gz


If somebody could point out why I am seeing this or what I am doing wrong, I would be very grateful!

Cheers!

PS: I also posted this question in the GATK forum, but it feels like you rarely get an answer there.

6 months ago
nhaus ▴ 130

I figured out what the problem was. I ran ApplyVQSR twice on the data. The first time with a sensitivity threshold of 99%. And then on the resulting data I ran ApplyVQSR with a sensitivity threshold of 97%. If I use 97% right away everything works!

I hope this helps if someone is struggling with a similar problem.

Cheers!

6 months ago
rbagnall ★ 1.7k

I think you are overwriting your cohort\_snp.recal

--recal-file cohort\_snp.recal --tranches-file cohort\_snp.recal


regenerate the cohort\snp.recal and then write tranches to --tranches-file cohort\_snp.tranches

Im sorry, this was a typo in the command. I fixed the actual command in my original post. Thank you for noticing