Question: vcftools does not filter by GQ
0
gravatar for AP
3.0 years ago by
AP90
AP90 wrote:

Hello,

I am trying to filter based on GQ < 15. I do the following:

vcftools --vcf infile.vcf --minGQ 15 --recode --out filtered

However, this filtering does not work, nothing is being removed:

After filtering, kept 1287174 out of a possible 1287174 Site

I confirm that the GQ tag is present in my VCF file. Other filters such as min/maxDP or minQ work just fine. I am using VCFtools - v0.1.13

Any thoughts on this would be greatly appreciated.

Thanks!

p.s: This is a cross-post from SEQanswer where I did not receive any answers: http://seqanswers.com/forums/showthread.php?t=69468

vcftools gq filter • 1.6k views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by AP90

what's the definition of GQ in the VCF header ? show us a genotype and its' FORMAT please.

ADD REPLYlink written 3.0 years ago by Pierre Lindenbaum120k

Thanks for your answer Pierre. In the VCF header, GQ stands for Genotype Quality. Here is a copy of the header containing the FORMAT fields:

##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

Here is an example of a genotype

GT:PL:DP:SP:GQ  1/1:83,33,0:11:0:40

FYI, the vcf file was generated this way

samtools mpileup -C 50 -E -t SP -t DP -u -I -f genome -b bam_list.txt > out.bcf
bcftools call -v -c -f gq out.bcf > out.vcf
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by AP90
1
gravatar for AP
3.0 years ago by
AP90
AP90 wrote:

Here is an explanation:

GT is just replaced by ./. when GQ is below the threshold. I thought the genotype would simply be completely removed. That is why there is the same number of lines left between none-filtered and filtered files and that GQ information can still be seen, even after filtering.

This is hard to tell though. On the current manual, it says for —minGQ "Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites”. It doesn’t really say if data is removed or not (like most filtering do).

An older manual version states: "These options are used to exclude genotypes from any analysis being performed by the program. If excluded, these values will be treated as missing. ... Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites."

So all sites with GQ below the threshold changes the genotype to "./.", without actually removing/filtering out any lines.

ADD COMMENTlink written 3.0 years ago by AP90

Thank you for explaining this AP. I was troubled by the same situation.

ADD REPLYlink written 2.9 years ago by swatipuraanik30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 596 users visited in the last hour