Question: Filtering out the variants annotated as "FILTER" from vcf file using GATK
2
gravatar for Vijay Lakhujani
22 months ago by
Vijay Lakhujani3.1k
India
Vijay Lakhujani3.1k wrote:

Hi,

I am using GATK to filter out variants (specifically SNP's) based on mapping quality

For tagging the variants which failed the MQ(mapping quality) filter, I ran the following commands from GATK

java -Xmx10g -jar /state/partition1/results/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar  -T VariantFiltration -R reference.fasta -V snp.vcf --filterExpression "MQ < 30.0" --filterName "MQ_snp_filter" -o MQ_filtered_snps.vcf

Later, I verified that it tagged the variants where MQ is less than 30; here is an excerpt from the output vcf file (used AWK to grep few relevant columns):

MQ_snp_filter AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=26.88;QD=17.87;SOR=2.303
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.57;QD=18.76;SOR=1.179
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=27.02;SOR=1.609
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=35.22;SOR=1.609

Now, I want to remove the tagged variants (remove variants where MQ < 30.0) using GATK. While I went through the help, but could not get to the specific command. Any help would be appreciated.

PS: While I know that there are several ways to filters variants outside GATK, I want to stick to it!

filter snp gatk vcf • 1.4k views
ADD COMMENTlink modified 22 months ago by Pierre Lindenbaum114k • written 22 months ago by Vijay Lakhujani3.1k
3
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:

SelectVariants with a JXL expression

java -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T SelectVariants \
   --variant input.vcf \
   -o output.vcf \
  -select 'FILTER != "MQ_snp_filter"'
ADD COMMENTlink written 22 months ago by Pierre Lindenbaum114k
1
gravatar for WouterDeCoster
22 months ago by
Belgium
WouterDeCoster33k wrote:

You could do this using SelectVariants:

java -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T SelectVariants \
   --variant input.vcf \
   -o output.vcf \
   -IDs fileKeep \
   -excludeIDs fileExclude

You use either the -IDs or -excludeIDs which is a file containing the variant IDs to respectively include or exclude. You can easily construct that file using some grep/cut/awk manipulations.

ADD COMMENTlink written 22 months ago by WouterDeCoster33k

Thanks WouterDeCoster for the prompt response.

Just to be sure, here ID would be the 3rd column in the vcf file, right?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sesamum_Indicum.L
ADD REPLYlink written 22 months ago by Vijay Lakhujani3.1k

Yes, indeed. You probably want to make sure those are uniquely chosen/assigned.

ADD REPLYlink written 22 months ago by WouterDeCoster33k

Uh oh. I just checked the vcf and found that I don't have any ID's in the vcf i.e. the 3rd column just contains a dot for each and every row. Apparently, I missed a step which assigns ID's to the variants? Any clues?

ADD REPLYlink modified 22 months ago • written 22 months ago by Vijay Lakhujani3.1k

A solution is bcftools annotate, SnpSift Annotate or GATK variantannotator. I use the first and the second of those tools for filling in the ID column.

java -Xmx10g -jar SnpSift.jar annotate -id dbsnp.vcf variants.vcf | bcftools annotate --set-id +'%CHROM\_%POS' -  > variantsWithIds.vcf

Using this command the dbsnp identifier is filled in for each position, and for positions which don't have such identifier a combination of the CHROM and POS field is made to form a unique identifier. Note that you can also just use bcftools annotate to set the IDs by position without bothering about dbsnp identifiers.

ADD REPLYlink written 22 months ago by WouterDeCoster33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour