Question: split vcf by individual
gravatar for sonia.olaechea
7 months ago by
sonia.olaechea90 wrote:

Hi all!

I have a vcf file which contains 2 regions (in different chromosomes) for a group of individuals. I'm trying to split this vcf just by individual. The file IDlist.txt contains the IDs of all the individuals. The command I'm using is the following:

for i in $(cat IDlist.txt);
do i="${i%\\n}";
    vcf-subset -c $i file1.vcf > ${i}_file1.vcf

The command is able to split correctly the file by individuals, but the output contains just information about the first region (and not the second). Could anyone help me figuring out what's going on? Or suggest any other command that works better? Thank you very much in advance!! :) :)

snp vcftools vcf-subset vcf • 384 views
ADD COMMENTlink modified 7 months ago by Paul1.3k • written 7 months ago by sonia.olaechea90
gravatar for finswimmer
7 months ago by
finswimmer11k wrote:

A combination of parallel and bcftools can do this:

$ cat IDlist.txt | parallel 'bcftools view -s {}_file1.vcf > {}_file1.vcf'

fin swimmer

ADD COMMENTlink written 7 months ago by finswimmer11k
gravatar for Paul
7 months ago by
European Union
Paul1.3k wrote:

Hi, another way is to use bcftools query and GATK SelectVariants. Code is here:


Script looks:



for sample in `bcftools query -l INPUT.vcf`; do

 for i in *.vcf

 do gatk SelectVariants -R $genome -V $i -O ${sample}_${i}_test -sn $sample



First for cycle is generate just list of Samples use in INPUT.vcf (here you need to put path to VCF). Second for cycle is take all VCF in current directory and split them according to first list ($sample variable).

You can run it as:

make script executable: chmod +x

and run it:

ADD COMMENTlink written 7 months ago by Paul1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 561 users visited in the last hour