How to filter vcf file on minimum genotype depth and quality for each sample
4
8
Entering edit mode
5.9 years ago
William ★ 5.0k

How can I filter a vcf filter a VCF file on minimum genotype depth and genotype quality for each sample.

I am looking for a way to filter variants from a VCF file by checking that all samples for a site pass 2 critera

sample.DP > 10
sample.GQ > 15


I thought I could do this using bcftools

bcftools view  -i  'MIN(FMT/DP>10) && MIN(FMT/GQ>15)'   my.vcf.gz


Somehow this include expression does not seem to be applied.

In my output are still lots of variants with genotypes likes

GT:AO:DP:GQ:PL:QA:QR:RO
0|0:0:16:3:0,48,502:0:554:16   (=depth 16, quality 3)
1|1:1:1:3:21,3,0:37:0:0        (=depth 1, quality 3)


I tried putting the include commands in two different bcftools commands

bcftools view  -i  'MIN(FMT/DP>10)' | bcftools view -i 'MIN(FMT/GQ>15)'   my.vcf.gz


And I tried without the FMT prefix for the genotype quality

bcftools view -i  'MIN(FMT/DP>10)  &&  MIN(GQ>15)'  my.vcf.gz


Still I get variants back with genotypes that don't match the criteria.

Am I misunderstanding the bcftools include expression documentation?

Or is there another way that I can achieve this filtering step?

vcf bcftools • 31k views
11
Entering edit mode
5.2 years ago
William ★ 5.0k

The problem was wrong place of the parentheses:

Does not work as expected:

bcftools view  -i  'MIN(FMT/DP>10) & MIN(FMT/GQ>15)'   my.vcf.gz


Works as expected:

bcftools view  -i  'MIN(FMT/DP)>10 & MIN(FMT/GQ)>15'   my.vcf.gz

3
Entering edit mode
5.9 years ago
artolkamp ▴ 40

I use vcftools for basic vcf filtering which I find to be very fast and simple to use.

An example would be

vcftools --vcf myfile.vcf --out output_prefix --minGQ 15 --minDP 10 --recode --recode-INFO-all

Manual can be found here https://vcftools.github.io/man_latest.html

0
Entering edit mode

Cools seems to do the job. Thanks.

0
Entering edit mode

At closer look it's not exactly what I was looking for. This command sets the genotypes that don't match the criteria to ./. . I am looking to remove the whole variant if not all genotypes pass the criteria. .

0
Entering edit mode

Does anyone know if there is a way to do this (set genotypes that don't match the criteria to ./.) in bcftools? I'd like to set low-quality genotypes to missing in a multi-sample vcf but keep the sites that contain them, and can't seem to work out how. The vcftools solution works fine, but I'm curious to know if it can be done in bcftools too.

0
Entering edit mode

from the above script, --minGQ and --minDP take the value of 15 and 10 respectively for filtering genotype quality and depth. Is there any criteria for choosing 15 and 10.. Just to know if 15 and 10 are standard values to use for those flags.

3
Entering edit mode
5.9 years ago
William ★ 5.0k

Filtering not on the minimum but on the average GQ and sample DP seems to work for me and the result is close enough to what I was looking for.

bcftools view -i 'AVG(GQ)>20 & AVG(FMT/DP)>10'

0
Entering edit mode

Hello, Excuse me could you please let me know that the FMT stands for what? Thanks, Omid

0
Entering edit mode

FMT is for Format, you can find more info here: Filtering of VCF, INFO DP or FORMAT DP

1
Entering edit mode
5.3 years ago
FatihSarigol ▴ 240

In case you have a vcfutils.pl in your pipeline, especially in the famous consensus sequence calling one with mpileup, you can use the -d flag to set the minimum coverage. It also should work alone when converting vcf to fq, for a minimum depth coverage of 10 for example:

• vcfutils.pl vcf2fq -d 10

vcfutils.pl script exists in the bcftools/bin btw

1
Entering edit mode