VCFtools not filtering VCF file by positions
5.5 years ago
devenvyas ▴ 680

So I downloaded a gigantic gzipped VCF from the Mota man genome (

I want to filter it down to just the sites where I have data from the Human Origins SNP array. I make a file called positions.txt, which has chromosome TAB basepair for all the SNPs I have data one. A lot of the SNPs don't have rs #s, so that route won't work. Luckily everything is Hg19. Here are the first few lines of positions.txt

1    842013
1    891021
1    903426
1    949654
1    1018704

I run

vcftools --gzvcf ../GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz --positions positions.txt --recode --out Mota_HuOrg

and then I get an error message as follows

VCFtools - v0.1.12b
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf ../GB20_sort_merge_dedup_l30_IR_q30_mapDamage_Entire.vcf.gz
    --out Mota_HuOrg
    --positions positions.txt

Using zlib version: 1.2.3
Versions of zlib >= 1.2.4 will be *much* faster when reading zipped VCF files.
After filtering, kept 1 out of 1 Individuals
Outputting VCF file...
After filtering, kept 0 out of a possible -1563604250 Sites
File does not contain any sites
Run Time = 11431.00 seconds

Does anyone know what I am doing wrong? Thanks!




vcftools vcf SNP
Still, thank you very much for the post. I already know how to filter by position.

5.5 years ago
devenvyas ▴ 680

I figured it out. My position list was plain numbers. The VCF file included chr before each chromosome number. I added chr to positions files, and it started working, but it seems to be jettison other information from the VCF files...


