Data_Type_Specific_Filters For Gatk Variantfiltration?
1
0
Entering edit mode
12.1 years ago
interfect ▴ 10

The GATK documentation for best-practices genotype calling recommends filtering called variants. Usually, you should use an adaptive filter that learns from a set of known good variants. If you do not have enough data for that (or, presumably, if you do not have a database of known variants for your organism), it is recommended to use explicit filtering on certain attributes with fixed thresholds, using the VariantFiltration operation.

In these circumstances, the best practices document says:

For SNPs * DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "MQ < 40.0", "FS > 60.0", "HaplotypeScore > 13.0", "MQRankSum < -12.5", "ReadPosRankSum < -8.0".

For Indels * DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "ReadPosRankSum < -20.0", "InbreedingCoeff < -0.8", "FS > 200.0".

How is DATA_TYPE_SPECIFIC_FILTERS specified on the command line? The VariantFiltration documentation does not list it as an option. Could you give an example command showing how these recommended filters can be passed to VariantFiltration?

gatk • 3.9k views
ADD COMMENT
2
Entering edit mode
12.1 years ago
vdauwera ★ 1.2k

You will need to compose filter expressions to filter against those annotations with the indicated values. See these links for details:

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html

http://www.broadinstitute.org/gatk/guide/article?id=51

http://www.broadinstitute.org/gatk/guide/article?id=1255

We (@gatk_dev) have updated the Best Practices document to clarify this point.

ADD COMMENT
0
Entering edit mode

Hello,

This clears things up significantly. However, I still can't find how to apply a filter only to SNPs or only to indels. Do I need to use the JEXL expressions vc.getType() == Type.INDEL and vc.getType() == Type.SNP || vc.getType() == Type.MNP to select the appropriate kinds of variants? Is the Type enum even accessible from JEXL? Is there a better way?

EDIT: Also, the overall semantics are still unspecified. Do I want to remove variants matching any of the criteria? Remove those matching all the criteria? Keep only those that match at least one criterion? Or keep only those that match all the criteria? I would guess I'm supposed to remove variants that match any of the criteria for their particular variant type, but I'm not sure.

ADD REPLY
0
Entering edit mode

Sorry to reply so late -- for best support, I recommend posting your question directly on the GATK forum: http://gatkforums.broadinstitute.org/

ADD REPLY

Login before adding your answer.

Traffic: 1449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6