I have a vcf file which contains 2 regions (in different chromosomes) for a group of individuals. I'm trying to split this vcf just by individual. The file IDlist.txt contains the IDs of all the individuals. The command I'm using is the following:
for i in $(cat IDlist.txt);
do i="${i%\\n}";
vcf-subset -c $i file1.vcf > ${i}_file1.vcf
done
The command is able to split correctly the file by individuals, but the output contains just information about the first region (and not the second). Could anyone help me figuring out what's going on? Or suggest any other command that works better?
Thank you very much in advance!! :) :)
Hi, another way is to use bcftools query and GATK SelectVariants. Code is here:
cat split_vcf.sh
Script looks:
#!/bin/bash
genome=~/path/to/reference/genome.fa
for sample in `bcftools query -l INPUT.vcf`
do
for i in *.vcf
do
gatk SelectVariants -R $genome -V $i -O ${sample}_${i}_test -sn $sample
done
done
First for cycle is generate just list of Samples use in INPUT.vcf (here you need to put path to VCF). Second for cycle is take all VCF in current directory and split them according to first list ($sample variable).
You can run it as:
make script executable: chmod +x split_vcf.sh
and run it:
./split_vcf.sh
ADD COMMENT
• link
updated 3.2 years ago by
Ram
44k
•
written 5.9 years ago by
Paul
★
1.5k