Hi,
I am using GATK to filter out variants (specifically SNP's) based on mapping quality
For tagging the variants which failed the MQ(mapping quality) filter, I ran the following commands from GATK
java -Xmx10g -jar /state/partition1/results/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R reference.fasta -V snp.vcf --filterExpression "MQ < 30.0" --filterName "MQ_snp_filter" -o MQ_filtered_snps.vcf
Later, I verified that it tagged the variants where MQ is less than 30; here is an excerpt from the output vcf file (used AWK to grep few relevant columns):
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=26.88;QD=17.87;SOR=2.303
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=3;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.57;QD=18.76;SOR=1.179
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=27.02;SOR=1.609
MQ_snp_filter AC=2;AF=1.00;AN=2;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=23.18;QD=35.22;SOR=1.609
Now, I want to remove the tagged variants (remove variants where MQ < 30.0) using GATK. While I went through the help, but could not get to the specific command. Any help would be appreciated.
PS: While I know that there are several ways to filters variants outside GATK, I want to stick to it!
Thanks WouterDeCoster for the prompt response.
Just to be sure, here
ID
would be the 3rd column in the vcf file, right?Yes, indeed. You probably want to make sure those are uniquely chosen/assigned.
Uh oh. I just checked the vcf and found that I don't have any ID's in the vcf i.e. the 3rd column just contains a dot for each and every row. Apparently, I missed a step which assigns ID's to the variants? Any clues?
A solution is bcftools annotate, SnpSift Annotate or GATK variantannotator. I use the first and the second of those tools for filling in the ID column.
Using this command the dbsnp identifier is filled in for each position, and for positions which don't have such identifier a combination of the CHROM and POS field is made to form a unique identifier. Note that you can also just use bcftools annotate to set the IDs by position without bothering about dbsnp identifiers.