Question: split vcf by individual
3 months ago
sonia.olaechea wrote:

Hi all!

I have a vcf file which contains 2 regions (in different chromosomes) for a group of individuals. I'm trying to split this vcf just by individual. The file IDlist.txt contains the IDs of all the individuals. The command I'm using is the following:

for i in $(cat IDlist.txt);
do i="${i%\\n}";
    vcf-subset -c $i file1.vcf > ${i}_file1.vcf

The command is able to split correctly the file by individuals, but the output contains just information about the first region (and not the second). Could anyone help me figuring out what's going on? Or suggest any other command that works better? Thank you very much in advance!! :) :)

snp vcftools vcf-subset vcf
3 months ago
3 months ago
finswimmer wrote:

A combination of parallel and bcftools can do this:

$ cat IDlist.txt | parallel 'bcftools view -s {}_file1.vcf > {}_file1.vcf'

fin swimmer

ADD COMMENTlink written 3 months ago by finswimmer11k
3 months ago
European Union
Paul wrote:

Hi, another way is to use bcftools query and GATK SelectVariants. Code is here:


Script looks:



for sample in `bcftools query -l INPUT.vcf`; do

 for i in *.vcf

 do gatk SelectVariants -R $genome -V $i -O ${sample}_${i}_test -sn $sample



First for cycle is generate just list of Samples use in INPUT.vcf (here you need to put path to VCF). Second for cycle is take all VCF in current directory and split them according to first list ($sample variable).

You can run it as:

make script executable: chmod +x

and run it:

ADD COMMENTlink written 3 months ago by Paul1.3k
