Question: Filtering Vcf File
gravatar for bioinfo
8.5 years ago by
bioinfo790 wrote:

I was wondering how to filter the vcf file based on a few input arguments ( DP>10, MQ>30 and QD>20 or GT = "1/1" etc)? I m planning to use simple command on the command line to extract the info and create a new filtered vcf file. I want to keep the 20 lines of vcf header INFO in new file as well. I can do it with perl but is there any other easy way? Last time I extracted my required info from vcf file using vcftools but I couldnt get a filtered vcf file.

My command

vcftools --vcf GMM_homo.vcf --depth --FILTER-summary --TsTv-by-count --site-mean-depth --SNPdensity 1000 --site-pi --minQ 30 --min-meanDP 5 --out homo_GMM

vcf indel vcftools snp • 38k views
ADD COMMENTlink modified 3.3 years ago by DataFanatic300 • written 8.5 years ago by bioinfo790

I just tried

egrep '^#|"GT =1/1" | "DP>10","MQ>30"' my.vcf > filtered.vcf

Didn't work though.

ADD REPLYlink modified 8.5 years ago by Sukhi Singh10k • written 8.5 years ago by bioinfo790

I need to filter my vcf file to include variants with at least 30 individuals in each of the possible groups: major allele homozygote, heterozygote, and minor allele homozygotes; would be grateful for any input. Thanks!

ADD REPLYlink written 3.3 years ago by DataFanatic300

ask this as a new question please.

ADD REPLYlink written 3.3 years ago by Pierre Lindenbaum134k
gravatar for Erik Garrison
8.2 years ago by
Erik Garrison2.3k
Napoli, IT / UCSC
Erik Garrison2.3k wrote:

You can do exactly this with vcffilter in vcflib!

Here's how to select all variants with depth greater than 10, mapping quality greater than 30, and QD greater than 20:

vcffilter -f "DP > 10 & MQ > 30 & QD > 20" file.vcf >filtered.vcf

Now, to select only variants with homozygotes, you can strip every genotype that's not homozygous, fix up the file's AC and AF fields using the genotypes with vcffixup, and then remove all the AC = 0 sites (again, using vcffilter).

cat filtered.vcf | vcffilter -g "GT = 1/1" | vcffixup - | vcffilter -f "AC > 0" >results.vcf

The expression language is clunky (you have to put spaces in between the tokens, and parenthetical expressions also have to have spaces). There is also no != symbol, but as a workaround you can do ! ( expression ).

For instance, to pick up non-homozygous genotypes, you'd use:

vcffilter -g "! ( GT = 1/1 )"

I'd like to fix some of these things (and also add regex matching for strings) but this far it more than does the job for quick filtering operations, allowing me to do virtually any kind of filtering from the command line without having to drop into writing a custom script.

These are the supported operations: > < = | & !, and symbols: ( ). Strings are interpreted literally. There is some type checking using the VCF header, so you have to have a valid VCF file. The output is a valid VCF file, so you can stream the filter results into another filtering operation.

ADD COMMENTlink written 8.2 years ago by Erik Garrison2.3k

Note that this will work for any values in the INFO field or per-sample fields.

ADD REPLYlink written 8.2 years ago by Erik Garrison2.3k

Does the vcffilter -f work with mutect vcf output? I tried it but does not seem to work. The vcf output of Mutect has a column as FILTER and I want to only keep the variants that have the value PASS for that column, ideally it should be like this

vcffilter -f "FILTER = PASS" file.vcf > filt_out.vcf

But this does not seem to work. Can anyone tell me where am getting it wrong?

ADD REPLYlink written 6.2 years ago by ivivek_ngs5.1k

problem solved, works well with epgrep command.. thanks

ADD REPLYlink written 6.2 years ago by ivivek_ngs5.1k

Hi, I have a mutect VCF file with the same FILTER column and PASS value I tried to run the vcffilter command but as you said it does not work. I saw that you solved the problem with grep. Please could you give me more information? Thanks

ADD REPLYlink written 4.9 years ago by marcoabbestia0

I have a problem with vcffilter. When I use it it removes variant info (Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type| ...). Here is my command:

vcffilter -k -f "( TYPE = ins | TYPE = del ) & FDP > 10 & HRUN < 6" -f "QUAL > 20" -g "FAO > 4 & GQ > 5" file.vcf | vcf-annotate --fill-AC-AN | vcffilter -f "AC > 0" > file.vcf.indelfilter.vcf"

Any idea where is the mistake and how to fix it?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by siabadaba80
gravatar for Pierre Lindenbaum
8.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

I wrote some tools to extract the fields from INFO and FORMAT. See: and

$ cat data.vcf.gz |\
   extractformat -t GT |\
   awk -F '        ' '($11=="1/1") |\
   extractinfo -t DP |\
  awk -F '        ' '(int($12)>10")'
ADD COMMENTlink written 8.5 years ago by Pierre Lindenbaum134k

3.5 years later: this is wrong. Just filter the VCF using or extract the fields using gatk varianttotable

ADD REPLYlink written 5.1 years ago by Pierre Lindenbaum134k

Pierre Lindenbaum

can we convert the fpfilter out file which filters output of varscan for false postives to convert into vcf4.0 format? I tried vcf-annotate but to no avail. I was trying to write a script but does not help me out. I would like to know if you can any custom tool designed for it?

ADD REPLYlink written 6.3 years ago by ivivek_ngs5.1k
gravatar for Sean Davis
8.2 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

snpSift, a utility associated with snpEff, has several options for filtering and transforming from vcf to tab-delimited text.

ADD COMMENTlink written 8.2 years ago by Sean Davis26k
gravatar for Adam
8.2 years ago by
United States
Adam1.0k wrote:

Don't you just need to add --recode to your command?

ADD COMMENTlink written 8.2 years ago by Adam1.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2263 users visited in the last hour