Filtering out the variants annotated as "FILTER" from vcf file using GATK
2
2
Entering edit mode
7.3 years ago

Hi,

I am using GATK to filter out variants (specifically SNP's) based on mapping quality

For tagging the variants which failed the MQ(mapping quality) filter, I ran the following commands from GATK

java -Xmx10g -jar /state/partition1/results/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar  -T VariantFiltration -R reference.fasta -V snp.vcf --filterExpression "MQ < 30.0" --filterName "MQ_snp_filter" -o MQ_filtered_snps.vcf

Later, I verified that it tagged the variants where MQ is less than 30; here is an excerpt from the output vcf file (used AWK to grep few relevant columns):

MQ_snp_filter AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=26.88;QD=17.87;SOR=2.303
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.57;QD=18.76;SOR=1.179
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=27.02;SOR=1.609
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=35.22;SOR=1.609

Now, I want to remove the tagged variants (remove variants where MQ < 30.0) using GATK. While I went through the help, but could not get to the specific command. Any help would be appreciated.

PS: While I know that there are several ways to filters variants outside GATK, I want to stick to it!

GATK vcf filter SNP • 4.9k views
ADD COMMENT
4
Entering edit mode
7.3 years ago

SelectVariants with a JXL expression

java -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T SelectVariants \
   --variant input.vcf \
   -o output.vcf \
  -select 'FILTER != "MQ_snp_filter"'
ADD COMMENT
1
Entering edit mode
7.3 years ago

You could do this using SelectVariants:

java -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T SelectVariants \
   --variant input.vcf \
   -o output.vcf \
   -IDs fileKeep \
   -excludeIDs fileExclude

You use either the -IDs or -excludeIDs which is a file containing the variant IDs to respectively include or exclude. You can easily construct that file using some grep/cut/awk manipulations.

ADD COMMENT
0
Entering edit mode

Thanks WouterDeCoster for the prompt response.

Just to be sure, here ID would be the 3rd column in the vcf file, right?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sesamum_Indicum.L
ADD REPLY
0
Entering edit mode

Yes, indeed. You probably want to make sure those are uniquely chosen/assigned.

ADD REPLY
0
Entering edit mode

Uh oh. I just checked the vcf and found that I don't have any ID's in the vcf i.e. the 3rd column just contains a dot for each and every row. Apparently, I missed a step which assigns ID's to the variants? Any clues?

ADD REPLY
0
Entering edit mode

A solution is bcftools annotate, SnpSift Annotate or GATK variantannotator. I use the first and the second of those tools for filling in the ID column.

java -Xmx10g -jar SnpSift.jar annotate -id dbsnp.vcf variants.vcf | bcftools annotate --set-id +'%CHROM\_%POS' -  > variantsWithIds.vcf

Using this command the dbsnp identifier is filled in for each position, and for positions which don't have such identifier a combination of the CHROM and POS field is made to form a unique identifier. Note that you can also just use bcftools annotate to set the IDs by position without bothering about dbsnp identifiers.

ADD REPLY

Login before adding your answer.

Traffic: 1577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6