Question: Split a VCF file into individual sample files
0
gravatar for win
4.0 years ago by
win810
India
win810 wrote:

I have the thousand genomes VCF file such as the following ALL.chrX.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz and this contains all the samples data in many, many columns. I want to split this file into separate VCF files, one for each sample. I tried the following code with the new bcftools and kept getting error that -h is not defined. Is there some other way I can convert into individual VCF files?

for file in *.vcf*; do for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file done done

Thanks in advance.

vcf • 4.9k views
ADD COMMENTlink modified 2.1 years ago by Jorge Amigo11k • written 4.0 years ago by win810
2

A: How To Split Multiple Samples In Vcf File Generated By Gatk?

Splitting A Vcf File

A: Splitting A Vcf File

ADD REPLYlink written 4.0 years ago by Ashutosh Pandey11k

the code i put is from the post suggested and it does not work and my files are not generated using GATK.

ADD REPLYlink written 4.0 years ago by win810
0
gravatar for Brice Sarver
4.0 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

view -h is not defined in (at least) bcftools v0.1.19; it is defined in v1.0+. I would check your binaries.

 

ADD COMMENTlink written 4.0 years ago by Brice Sarver2.6k
0
gravatar for Len Trigg
4.0 years ago by
Len Trigg1.2k
New Zealand
Len Trigg1.2k wrote:

I would just use zgrep rather than your bcftools command to get the header:

  for file in *.vcf*; do
    for sample in $(zgrep -m 1 "^#CHROM" $file | cut -f10-); do
      bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
    done
  done

 

ADD COMMENTlink written 4.0 years ago by Len Trigg1.2k
  • zgrep parses the entire file
  • bcftools view -h parses only the header, so it's faster
  • bcftools query -l lists all samples, so it's the fastest
ADD REPLYlink written 2.1 years ago by Jorge Amigo11k

Incorrect, zgrep -m 1 does not parse the entire file. (bcftools query -l is still better though :-))

ADD REPLYlink written 2.1 years ago by Len Trigg1.2k

you're right. I didn't realize the -m 1 option that stops reading the file after the first match.

ADD REPLYlink written 2.1 years ago by Jorge Amigo11k
0
gravatar for Jorge Amigo
2.1 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

already stated here and here:

for file in *.vcf*; do
  for sample in `bcftools query -l $file`; do
    bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
  done
done
ADD COMMENTlink written 2.1 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour