I was trying to filter VCF files by sample using vcftools, and I'm testing on the 1000 genomes datasets
If I try to filter by CEU samples for example, I can try this
vcftools --gzvcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --recode --out CEU --keep CEU.tsv
Where CEU.tsv contains the sample IDs that are from the CEU population
The thing is that this appears to include variants where there are no variations in the kept samples. I tried settings --min-alleles also, but this didn't seem to fix it.
This operation is also pretty slow...any faster ways to do it?