GATK ApplyVQSR filtering doesnt work
2
0
Entering edit mode
2.9 years ago
nhaus ▴ 300

Hello, 

I am using GATK 4.2 to perform germline calling. To reduce the number of false positives, I use the VariantRecalibration workflow with the recommended resources.  

After using ApplyVQSR in the SNP mode, I notice that many SNPs have "PASS" in the VCF FILTER column, although they should have been filtered, according to the AS_FilterStatus.

Here are two such variant:

1 2424417 . T C 2265.06 PASS AC=2;AF=1.00;AN=2;AS\_BaseQRankSum=.;AS\_FS=0.000;AS\_FilterStatus=VQSRTrancheSNP97.00to98.00;AS\_MQ=60.00;AS\_MQRankSum=.;AS\_QD=33.31;AS\_ReadPosRankSum=.;AS\_SOR=1.352;AS\_VQSLOD=6.0834;AS\_culprit=MQ;DP=77;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=59.72;POSITIVE\_TRAIN\_SITE;QD=33.31;SOR=1.385 GT:AD:DP:GQ:PL 1/1:0,68:69:99:2279,204,0  
1 4246829 . T G 965.64 PASS AC=1;AF=0.500;AN=2;AS\_BaseQRankSum=-5.000;AS\_FS=19.511;AS\_FilterStatus=VQSRTrancheSNP99.00to99.30;AS\_MQ=59.74;AS\_MQRankSum=-1.700;AS\_QD=10.50;AS\_ReadPosRankSum=1.700;AS\_SOR=1.277;AS\_VQSLOD=0.0526;AS\_culprit=FS;BaseQRankSum=-4.925e+00;DP=96;ExcessHet=3.0103;FS=19.511;MLEAC=1;MLEAF=0.500;MQ=59.89;MQRankSum=-1.606e+00;NEGATIVE\_TRAIN\_SITE;POSITIVE\_TRAIN\_SITE;QD=10.50;ReadPosRankSum=1.80;SOR=1.277 GT:AD:DP:GQ:PL 0/1:52,40:92:99:973,0,1575

For all steps I am using the "Allele specific" calling pipeline.

I read in the `ApplyVQSR` documentation that if one allele passes, the whole site will be PASS, however, as you see by my example, there only is a single allele which fails the quality control.

The specific command I am using:

gatk ApplyVQSR -V cohort.indel.recalibrated.vcf.gz --recal-file cohort\_snp.recal --tranches-file cohort\_snp.tranches --truth-sensitivity-filter-level 97 -mode SNP -AS -O cohort.recalibrated.vcf.gz

If somebody could point out why I am seeing this or what I am doing wrong, I would be very grateful! 

Cheers!

PS: I also posted this question in the GATK forum, but it feels like you rarely get an answer there.

gatk variant calling germline • 1.3k views
ADD COMMENT
1
Entering edit mode
2.9 years ago
nhaus ▴ 300

I figured out what the problem was. I ran ApplyVQSR twice on the data. The first time with a sensitivity threshold of 99%. And then on the resulting data I ran ApplyVQSR with a sensitivity threshold of 97%. If I use 97% right away everything works!

I hope this helps if someone is struggling with a similar problem.

Cheers!

ADD COMMENT
0
Entering edit mode
2.9 years ago
rbagnall ★ 1.8k

I think you are overwriting your cohort\_snp.recal

--recal-file cohort\_snp.recal --tranches-file cohort\_snp.recal

regenerate the cohort\snp.recal and then write tranches to --tranches-file cohort\_snp.tranches

ADD COMMENT
0
Entering edit mode

Im sorry, this was a typo in the command. I fixed the actual command in my original post. Thank you for noticing

ADD REPLY

Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6