How to extract info of samples of interest from VCF file?
1
0
Entering edit mode
20 months ago
Qingyang Xiao ▴ 160

Hi,

Now I am starting a GWAS project. VCF files have been generated. However, I only need the information of a subset of genotyped individuals (around 10%). The whole original VCF files contain too many samples and are too large.

How can I extract the information of the subset I am interested in?

Thanks.

Unix VCF subset command GWAS • 769 views
ADD COMMENT
3
Entering edit mode
20 months ago

Hi,

You need to use: bcftools view --samples

For example, the following command would subset the VCF, myvariants.vcf.gz, for samples with IDs POH12 and POH13:

bcftools view --samples POH12,POH13 myvariants.vcf.gz

For more information, please just type and execute bcftools view from the command line.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks Kevin. This is very nice!

What if i have a thousand sample list? Can I also do it with one command or?

ADD REPLY
2
Entering edit mode

Neste caso / In that case, I would keep the sample IDs in a file and use the flag:

-S, --samples-file [^]<file>  file of samples to include (or exclude with "^" prefix)
ADD REPLY

Login before adding your answer.

Traffic: 1687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6