Question: bcftools merge 1021 individual vcf files
2
gravatar for Shicheng Guo
8 months ago by
Shicheng Guo8.5k
Shicheng Guo8.5k wrote:

Hi All,

I have 7200 individual VCF files and I want to merge them into a single VCF with following command:

bcftools merge -l merge.txt -Oz -o merge.vcf.gz

What I found was if the sample counts <1021, everything is okay. However, if it is >= 1021, bcftools merge will reports:

[E::hts_idx_load3] Could not load local index file '229209.fstl1.vcf.gz.tbi'
Failed to open 229209.fstl1.vcf.gz: could not load index

Does anyone know what's wrong with it?

* Problem solved, Plink doesn't have maximum limitation*

Thanks.

Here is bcftools version:

wget https://github.com/samtools/bcftools/releases/download/1.10.2/bcftools-1.10.2.tar.bz2
wget https://github.com/samtools/samtools/releases/download/1.10/samtools-1.10.tar.bz2
wget https://github.com/samtools/htslib/releases/download/1.10.2/htslib-1.10.2.tar.bz2
wget http://s3.amazonaws.com/plink1-assets/dev/plink_linux_x86_64.zip
bcftools merge • 703 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by Shicheng Guo8.5k

Please post as solution and accept as answer so we have a closure.

ADD REPLYlink written 8 months ago by zx87549.7k
2
gravatar for WouterDeCoster
8 months ago by
Belgium
WouterDeCoster44k wrote:

Not the same bcftools command, but you'll find the answer in this (my) blog post: bcftools concat: Failed to open variants.vcf.gz: could not load index. (Also one of the results if you google for "Failed to open vcf.gz: could not load index").

TLDR: you are opening too many files simultaneously.

ADD COMMENTlink written 8 months ago by WouterDeCoster44k

Is there any solution? merge 1020 each round and then do it again?

ADD REPLYlink written 8 months ago by Shicheng Guo8.5k

what about using plink to merge them?

ADD REPLYlink written 8 months ago by Shicheng Guo8.5k

Okay. Done, plink is okay.

ADD REPLYlink written 8 months ago by Shicheng Guo8.5k
1
gravatar for Shicheng Guo
8 months ago by
Shicheng Guo8.5k
Shicheng Guo8.5k wrote:

Okay. Here is my final solution developed based on WouterDeCoste's post. I hope it is helpful. One of my friends told me his computer allowed merging 7000 VCF at one time. I am not sure whether it is caused by a specific file operating setting.

ls *.vcf.gz | split -l 500 - subset_vcfs

for i in subset_vcfs*; 
do 
bcftools merge -0 -l $i -Oz -o merge.$i.vcf.gz; 
tabix -p vcf merge.$i.vcf.gz
done

ls merge.*.vcf.gz > merge.txt
bcftools merge -l merge.txt -0 -Oz -o all_merged.vcf.gz
bcftools annotate -x INFO,^FORMAT/GT all_merged.vcf.gz -Oz -o Final.vcf.gz
ADD COMMENTlink modified 8 months ago • written 8 months ago by Shicheng Guo8.5k

What was the output of ulimit -u in your case?

ADD REPLYlink written 8 months ago by genomax92k

ulimit -u = 4096

ADD REPLYlink modified 8 months ago • written 8 months ago by Shicheng Guo8.5k

Thanks for genomax's tips. I find nofile - max number of open files in ulimit command

ADD REPLYlink written 8 months ago by Shicheng Guo8.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour