I used the Michigan Imputation Server to impute data that I have, and got three files per chromosome as output: .dose.vcf.gz, .dose.vcf.gz.tbi, and .info.gz. I only want to keep genotypes that are imputed with an R2 that is greater than 0.3. The .dose.vcf.gz has everything (I think) that I need to create a plink output file with only the high quality genotypes, but I am having a hard time getting vcftools to understand what I am asking it to do.
vcftools --vcf chr1.dose.vcf.gz --filter MinR2=.30 --plink --out plink_chr1 Error: Unknown option: --filter
Can someone please help me figure out how to filter on the value of an INFO field?
I have looked at the vcftools documentation and examples, and haven't yet figured out how to filter on INFO. The info filtering mentioned here only filters if the field exists at all, not for particular values of it. Any suggestions would be appreciated. is VCFtools not the correct tool for this?
Here is the header of the .dose.vcf.gz:
##fileformat=VCFv4.1 ##filedate=2016.8.6 ##source=Minimac3 ##contig=<ID=1> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]"> ##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1 "> ##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency"> ##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency"> ##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy"> ##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
Filtering Vcf File