filter out subset of samples from vcf file
1
2
Entering edit mode
5.4 years ago
mab658 ▴ 120

I have a very big compressed vcf file with over 8000 samples. I wanted to extract out data for those samples whose name start with first three characters "TMS" so that I can have a new vcf file with those samples and their variants data only. Could anyone help me out with the vcftools or bcftools command to accomplish this? Thanks

SNP RNA-Seq sequencing • 4.2k views
ADD COMMENT
5
Entering edit mode
5.4 years ago
gunzip -c in.vcf.gz | grep "#CHROM" -m 1  | cut -f 10- | tr "\t" "\n" | grep ^TMS > samples.txt

bcftools view --samples-file  samples.txt  in.vcf.gz
ADD COMMENT
0
Entering edit mode

zgrep "#CHROM"... ;)

ADD REPLY
0
Entering edit mode

Thanks Pierre. I am able to extract out the samples. All works fine.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 3298 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6