Question: Difficulty filtering VCF file with vcftools
gravatar for rc16955
2.9 years ago by
rc1695560 wrote:

Hi all,

I am trying to filter a VCF file by read depth (min 10) and mapping quality (min 30) using vcftools. I have found it quite difficult to find sample lines of anything like this, and being very new to bioinformatics and command lines in general, I'm fairly certain I'm doing something wrong. This is what I have tried so far:

vcftools --vcf myfile.vcf --minDP 10 --minQ 30 --out myfile_filtered

This generates

    VCFtools - v0.1.12b
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
        --vcf ED132_55.vcf
        --minDP 10
        --minQ 30
        --out ED132_55_filtered

After filtering, kept 1 out of 1 Individuals
After filtering, kept 87041 out of a possible 103802 Sites
Run Time = 0.00 seconds

And an output file entitled "myfile_filtered.log", which contains nothing but the exact information printed above. There is no new vcf file, so I can only assume something is going wrong.

I would be very grateful if anyone could help me fix this code, or offer alternatives that might work better. I should note that as I am not using my own computer, I can't easily install new software packages and am restricted to bcftools and vcftools. Any python or perl scripts would be wonderful

snp • 1.6k views
ADD COMMENTlink modified 2.9 years ago by colindaven1.7k • written 2.9 years ago by rc1695560
gravatar for colindaven
2.9 years ago by
Hannover Medical School
colindaven1.7k wrote:

You need the recode option from memory to create a new filtered file, just add --recode to your command lines

ADD COMMENTlink written 2.9 years ago by colindaven1.7k

Hi Colin, thank you for your answer; I am now able to generate VCF files. I'm still not convinced that the line is working correctly, however. Upon examining the files I can see that the there are some SNPs present in the unfiltered file that are missing from the filtered file even though they satisfy the criteria, while there are some SNPs in the filtered file that do not satisfy the criteria. The command has definitely removed some SNPs but I'm not sure one what basis it's done it.

Once again, many thanks for taking the time to respond.

ADD REPLYlink written 2.9 years ago by rc1695560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour