split vcf by individual
3
2
Entering edit mode
5.9 years ago
biosol ▴ 170

Hi all!

I have a vcf file which contains 2 regions (in different chromosomes) for a group of individuals. I'm trying to split this vcf just by individual. The file IDlist.txt contains the IDs of all the individuals. The command I'm using is the following:

for i in $(cat IDlist.txt);
do i="${i%\\n}";
    vcf-subset -c $i file1.vcf > ${i}_file1.vcf
done

The command is able to split correctly the file by individuals, but the output contains just information about the first region (and not the second). Could anyone help me figuring out what's going on? Or suggest any other command that works better? Thank you very much in advance!! :) :)

vcftools vcf-subset SNP vcf • 3.3k views
ADD COMMENT
3
Entering edit mode
5.9 years ago

A combination of parallel and bcftools can do this:

$ cat IDlist.txt | parallel 'bcftools view -s {}_file1.vcf > {}_file1.vcf'

fin swimmer

ADD COMMENT
1
Entering edit mode
5.9 years ago
Paul ★ 1.5k

Hi, another way is to use bcftools query and GATK SelectVariants. Code is here:

cat split_vcf.sh

Script looks:

#!/bin/bash

genome=~/path/to/reference/genome.fa

for sample in `bcftools query -l INPUT.vcf`
do
  for i in *.vcf
  do
    gatk SelectVariants -R $genome -V $i -O ${sample}_${i}_test -sn $sample
  done
done

First for cycle is generate just list of Samples use in INPUT.vcf (here you need to put path to VCF). Second for cycle is take all VCF in current directory and split them according to first list ($sample variable).

You can run it as:

make script executable: chmod +x split_vcf.sh

and run it:

./split_vcf.sh
ADD COMMENT

Login before adding your answer.

Traffic: 2470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6