Question: filter out subset of samples from vcf file
0
gravatar for mab658
12 weeks ago by
mab65820
mab65820 wrote:

I have a very big compressed vcf file with over 8000 samples. I wanted to extract out data for those samples whose name start with first three characters "TMS" so that I can have a new vcf file with those samples and their variants data only. Could anyone help me out with the vcftools or bcftools command to accomplish this? Thanks

sequencing rna-seq snp • 247 views
ADD COMMENTlink modified 12 weeks ago by Pierre Lindenbaum118k • written 12 weeks ago by mab65820
3
gravatar for Pierre Lindenbaum
12 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:
gunzip -c in.vcf.gz | grep "#CHROM" -m 1  | cut -f 10- | tr "\t" "\n" | grep ^TMS > samples.txt

bcftools view --samples-file  samples.txt  in.vcf.gz
ADD COMMENTlink written 12 weeks ago by Pierre Lindenbaum118k

zgrep "#CHROM"... ;)

ADD REPLYlink written 12 weeks ago by finswimmer11k

Thanks Pierre. I am able to extract out the samples. All works fine.

ADD REPLYlink written 12 weeks ago by mab65820

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 12 weeks ago by Pierre Lindenbaum118k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1913 users visited in the last hour