Question: filter out subset of samples from vcf file
1
gravatar for mab658
15 months ago by
mab65860
mab65860 wrote:

I have a very big compressed vcf file with over 8000 samples. I wanted to extract out data for those samples whose name start with first three characters "TMS" so that I can have a new vcf file with those samples and their variants data only. Could anyone help me out with the vcftools or bcftools command to accomplish this? Thanks

sequencing rna-seq snp • 1.1k views
ADD COMMENTlink modified 15 months ago by Pierre Lindenbaum127k • written 15 months ago by mab65860
5
gravatar for Pierre Lindenbaum
15 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:
gunzip -c in.vcf.gz | grep "#CHROM" -m 1  | cut -f 10- | tr "\t" "\n" | grep ^TMS > samples.txt

bcftools view --samples-file  samples.txt  in.vcf.gz
ADD COMMENTlink written 15 months ago by Pierre Lindenbaum127k

zgrep "#CHROM"... ;)

ADD REPLYlink written 15 months ago by finswimmer13k

Thanks Pierre. I am able to extract out the samples. All works fine.

ADD REPLYlink written 15 months ago by mab65860

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 15 months ago by Pierre Lindenbaum127k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2142 users visited in the last hour