Question: GATK variantFiltration unexpected behaviour?
0
gravatar for jtwalker
20 months ago by
jtwalker20
jtwalker20 wrote:

Hello,

I'm trying to use the GATK to hard filter variants (I'm using a non-model organism, so variant re calibration isn't possible), but, while I'm following the GATK website's tutorial (https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set), I haven't actually been able to filter out any variants.

As an example, after subsetting out the SNP's in my GenotypeGVCFs produced VCF file, I used

gatk -T VariantFiltration -R /PATH/reference_genome -V myfile.vcf  --filterExpression "MQ>20" --filterName "mq20_filter"  -o my_filtered_file.vcf

which should have flagged any variants with mapping quality below 20 with FILTER rather than PASS. When I use grep to check if this worked in the way that I expected, I found many instances where MQ=10.

There seems to be something missing with my JEXL expression that I'm simply not seeing in the tutorial, or other GATK documentation. Can anyone help me out?

Thanks!

edit: fixed a minor mistake

snp variant filtration gatk • 911 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by jtwalker20

show us a line in the VCF failing the expression, and, in the header, the line for ##INFO=<ID=MQ... please.

ADD REPLYlink written 20 months ago by Pierre Lindenbaum118k

A line from the VCF:

NC_024218.1 10250670    .   C   T   61.23   PASS    AC=2;AF=0.333;AN=6;DP=75;ExcessHet=1.0474;FS=0.000;MLEAC=2;MLEAF=0.333;MQ=10.95;QD=30.62;SOR=0.693  GT:AD:DP:GQ:PGT:PID:PL  0/0:6,0:6:18:.:.:0,18,230   ./.:1,0:1:.:.:.:0,0,0   1/1:0,2:2:6:1|1:10250670_C_T:90,6,0 ./.:2,0:2:.:.:.:0,0,0   ./.:10,0:10:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0   ./.:1,0:1:.:.:.:0,0,0   ./.:17,0:17:.:.:.:0,0,0 ./.:9,0:9:.:.:.:0,0,0   0/0:9,0:9:0:.:.:0,0,258 ./.:0,0:0:.:.:.:0,0,0   ./.:2,0:2:.:.:.:0,0,0   ./.:2,0:2:.:.:.:0,0,0   ./.:1,0:1:.:.:.:0,0,0   ./.:0,0:0:.:.:.:0,0,0   ./.:13,0:13:.:.:.:0,0,0

and I'm not quite sure what you mean by a line from the header, but I grepped your example and what returned was this:

##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

Thanks!

ADD REPLYlink modified 20 months ago • written 20 months ago by jtwalker20

I've encountered the same issue. Could you please provide your solution if you were able to fix this problem?

ADD REPLYlink written 3 months ago by serpalma.v20
2
gravatar for andrew.j.skelton73
20 months ago by
London
andrew.j.skelton735.5k wrote:

Isn't there a syntax problem here? I thought that GATK's filter expressions couldn't take integers, and they needed doubles, otherwise it'd throw an error.

It should be:

--filterExpression "MQ > 20.0"
ADD COMMENTlink written 20 months ago by andrew.j.skelton735.5k

It didn't give me an error, but I'll try that now. Thanks!

ADD REPLYlink written 20 months ago by jtwalker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1622 users visited in the last hour