I am using GATK 4.2 to perform germline calling. To reduce the number of false positives, I use the VariantRecalibration workflow with the recommended resources.
ApplyVQSR in the SNP mode, I notice that many SNPs have "PASS" in the VCF FILTER column, although they should have been filtered, according to the
Here are two such variant:
1 2424417 . T C 2265.06 PASS AC=2;AF=1.00;AN=2;AS\_BaseQRankSum=.;AS\_FS=0.000;AS\_FilterStatus=VQSRTrancheSNP97.00to98.00;AS\_MQ=60.00;AS\_MQRankSum=.;AS\_QD=33.31;AS\_ReadPosRankSum=.;AS\_SOR=1.352;AS\_VQSLOD=6.0834;AS\_culprit=MQ;DP=77;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.72;POSITIVE\_TRAIN\_SITE;QD=33.31;SOR=1.385 GT:AD:DP:GQ:PL 1/1:0,68:69:99:2279,204,0 1 4246829 . T G 965.64 PASS AC=1;AF=0.500;AN=2;AS\_BaseQRankSum=-5.000;AS\_FS=19.511;AS\_FilterStatus=VQSRTrancheSNP99.00to99.30;AS\_MQ=59.74;AS\_MQRankSum=-1.700;AS\_QD=10.50;AS\_ReadPosRankSum=1.700;AS\_SOR=1.277;AS\_VQSLOD=0.0526;AS\_culprit=FS;BaseQRankSum=-4.925e+00;DP=96;ExcessHet=3.0103;FS=19.511;MLEAC=1;MLEAF=0.500;MQ=59.89;MQRankSum=-1.606e+00;NEGATIVE\_TRAIN\_SITE;POSITIVE\_TRAIN\_SITE;QD=10.50;ReadPosRankSum=1.80;SOR=1.277 GT:AD:DP:GQ:PL 0/1:52,40:92:99:973,0,1575
For all steps I am using the "Allele specific" calling pipeline.
I read in the `ApplyVQSR` documentation that if one allele passes, the whole site will be PASS, however, as you see by my example, there only is a single allele which fails the quality control.
The specific command I am using:
gatk ApplyVQSR -V cohort.indel.recalibrated.vcf.gz --recal-file cohort\_snp.recal --tranches-file cohort\_snp.tranches --truth-sensitivity-filter-level 97 -mode SNP -AS -O cohort.recalibrated.vcf.gz
If somebody could point out why I am seeing this or what I am doing wrong, I would be very grateful!
PS: I also posted this question in the GATK forum, but it feels like you rarely get an answer there.