How to filter vcf file on minimum genotype depth and quality for each sample
4
8
Entering edit mode
8.4 years ago
William ★ 5.3k

How can I filter a vcf filter a VCF file on minimum genotype depth and genotype quality for each sample.

I am looking for a way to filter variants from a VCF file by checking that all samples for a site pass 2 critera

sample.DP > 10
sample.GQ > 15

I thought I could do this using bcftools

bcftools view  -i  'MIN(FMT/DP>10) && MIN(FMT/GQ>15)'   my.vcf.gz

Somehow this include expression does not seem to be applied.

In my output are still lots of variants with genotypes likes

GT:AO:DP:GQ:PL:QA:QR:RO
0|0:0:16:3:0,48,502:0:554:16   (=depth 16, quality 3)
1|1:1:1:3:21,3,0:37:0:0        (=depth 1, quality 3)

I tried putting the include commands in two different bcftools commands

bcftools view  -i  'MIN(FMT/DP>10)' | bcftools view -i 'MIN(FMT/GQ>15)'   my.vcf.gz

And I tried without the FMT prefix for the genotype quality

bcftools view -i  'MIN(FMT/DP>10)  &&  MIN(GQ>15)'  my.vcf.gz

Still I get variants back with genotypes that don't match the criteria.

Am I misunderstanding the bcftools include expression documentation?

https://samtools.github.io/bcftools/bcftools.html#expressions

Or is there another way that I can achieve this filtering step?

vcf bcftools • 46k views
ADD COMMENT
16
Entering edit mode
7.8 years ago
William ★ 5.3k

The problem was wrong place of the parentheses:

Does not work as expected:

bcftools view  -i  'MIN(FMT/DP>10) & MIN(FMT/GQ>15)'   my.vcf.gz

Works as expected:

bcftools view  -i  'MIN(FMT/DP)>10 & MIN(FMT/GQ)>15'   my.vcf.gz
ADD COMMENT
0
Entering edit mode

Don't forget to pipe the stdout into a new file. bcftools view --include 'MIN(FMT/DP)>=10 & MIN(FMT/GQ)>=15' my.vcf > my_output.vcf

ADD REPLY
0
Entering edit mode

There should also be a difference if you use "&" or "&&" according to https://samtools.github.io/bcftools/howtos/filtering.html and bcftools documentation.

ADD REPLY
4
Entering edit mode
8.4 years ago
artolkamp ▴ 50

I use vcftools for basic vcf filtering which I find to be very fast and simple to use.

An example would be

vcftools --vcf myfile.vcf --out output_prefix --minGQ 15 --minDP 10 --recode --recode-INFO-all

Manual can be found here https://vcftools.github.io/man_latest.html

ADD COMMENT
0
Entering edit mode

Cools seems to do the job. Thanks.

ADD REPLY
0
Entering edit mode

At closer look it's not exactly what I was looking for. This command sets the genotypes that don't match the criteria to ./. . I am looking to remove the whole variant if not all genotypes pass the criteria. .

ADD REPLY
0
Entering edit mode

Does anyone know if there is a way to do this (set genotypes that don't match the criteria to ./.) in bcftools? I'd like to set low-quality genotypes to missing in a multi-sample vcf but keep the sites that contain them, and can't seem to work out how. The vcftools solution works fine, but I'm curious to know if it can be done in bcftools too.

ADD REPLY
0
Entering edit mode

from the above script, --minGQ and --minDP take the value of 15 and 10 respectively for filtering genotype quality and depth. Is there any criteria for choosing 15 and 10.. Just to know if 15 and 10 are standard values to use for those flags.

ADD REPLY
0
Entering edit mode

I am wary of vcftools not erroring when it fails to parse: https://github.com/vcftools/vcftools/issues/134

ADD REPLY
3
Entering edit mode
8.4 years ago
William ★ 5.3k

Filtering not on the minimum but on the average GQ and sample DP seems to work for me and the result is close enough to what I was looking for.

bcftools view -i 'AVG(GQ)>20 & AVG(FMT/DP)>10'
ADD COMMENT
1
Entering edit mode

Hello, Excuse me could you please let me know that the FMT stands for what? Thanks, Omid

ADD REPLY
1
Entering edit mode

FMT is for Format, you can find more info here: Filtering of VCF, INFO DP or FORMAT DP

ADD REPLY
1
Entering edit mode
7.8 years ago
FatihSarigol ▴ 260

In case you have a vcfutils.pl in your pipeline, especially in the famous consensus sequence calling one with mpileup, you can use the -d flag to set the minimum coverage. It also should work alone when converting vcf to fq, for a minimum depth coverage of 10 for example:

  • vcfutils.pl vcf2fq -d 10

vcfutils.pl script exists in the bcftools/bin btw

ADD COMMENT
1
Entering edit mode

Thanks for your helpful answer.

ADD REPLY

Login before adding your answer.

Traffic: 2499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6