Question: Filter VCF by sample?
gravatar for cmdcolin
3.9 years ago by
United States
cmdcolin1.3k wrote:

I was trying to filter VCF files by sample using vcftools, and I'm testing on the 1000 genomes datasets

If I try to filter by CEU samples for example, I can try this

vcftools --gzvcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --recode --out CEU --keep CEU.tsv

Where CEU.tsv contains the sample IDs that are from the CEU population

The thing is that this appears to include variants where there are no variations in the kept samples. I tried settings --min-alleles also, but this didn't seem to fix it.

This operation is also pretty slow...any faster ways to do it?

vcftools 1000genomes • 2.7k views
ADD COMMENTlink modified 3.8 years ago by trausch1.5k • written 3.9 years ago by cmdcolin1.3k
gravatar for trausch
3.8 years ago by
trausch1.5k wrote:

BCFtools should work

bcftools view --force-samples -o ceu.vcf.gz -O z --samples-file CEU.tsv --min-ac 1 input.vcf.gz

ADD COMMENTlink written 3.8 years ago by trausch1.5k

Thanks again for this answer. Finding my own questions in a google search now 2 years later. Note that a relatively recent version of bcftools should be used e.g. the one from htslib simple because the options like --min-ac don't exist in the old 0.1.19 from the samtools package. If someone just wants a single sample you can just use bcftools view -s HG00096 --min-ac 1 100genomes.vcf.gz where --min-ac makes sure that there is at least 1 non-reference allele in the resulting output for that sample

ADD REPLYlink modified 20 months ago • written 20 months ago by cmdcolin1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1323 users visited in the last hour