I need to do some permutation test to random sampling millions of SNPs set genotyping data from 1000 genome phase III dataset. each SNP-set only contains ~300 SNPs, however, I found the extract process is quite slow. for example, for the ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf, to extract these 300 SNPs require almost 20 mins.
plink --vcf ~/hpc/db/hg19/1000Genome/chr6.uni.vcf --extract chr6.rs6457620.txt --r inter-chr dprime --out rs6457620
Is there any fast way to extract the subset quickly?
real 17m52.822s user 8m39.925s sys 1m20.681s
tabix ~/hpc/db/hg19/1000Genome/chr6.uni.vcf.gz -R chr6.rs6457620.txt > output.vcf
100 times faster!!!