Question: filter out subset of samples from vcf file
1
gravatar for mab658
8 months ago by
mab65830
mab65830 wrote:

I have a very big compressed vcf file with over 8000 samples. I wanted to extract out data for those samples whose name start with first three characters "TMS" so that I can have a new vcf file with those samples and their variants data only. Could anyone help me out with the vcftools or bcftools command to accomplish this? Thanks

sequencing rna-seq snp • 589 views
ADD COMMENTlink modified 8 months ago by Pierre Lindenbaum122k • written 8 months ago by mab65830
4
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:
gunzip -c in.vcf.gz | grep "#CHROM" -m 1  | cut -f 10- | tr "\t" "\n" | grep ^TMS > samples.txt

bcftools view --samples-file  samples.txt  in.vcf.gz
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum122k

zgrep "#CHROM"... ;)

ADD REPLYlink written 8 months ago by finswimmer12k

Thanks Pierre. I am able to extract out the samples. All works fine.

ADD REPLYlink written 8 months ago by mab65830

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1117 users visited in the last hour