bcftools merge 1021 individual vcf files
2
3
Entering edit mode
4.1 years ago
Shicheng Guo ★ 9.4k

Hi All,

I have 7200 individual VCF files and I want to merge them into a single VCF with following command:

bcftools merge -l merge.txt -Oz -o merge.vcf.gz

What I found was if the sample counts <1021, everything is okay. However, if it is >= 1021, bcftools merge will reports:

[E::hts_idx_load3] Could not load local index file '229209.fstl1.vcf.gz.tbi'
Failed to open 229209.fstl1.vcf.gz: could not load index

Does anyone know what's wrong with it?

* Problem solved, Plink doesn't have maximum limitation*

Thanks.

Here is bcftools version:

wget https://github.com/samtools/bcftools/releases/download/1.10.2/bcftools-1.10.2.tar.bz2
wget https://github.com/samtools/samtools/releases/download/1.10/samtools-1.10.tar.bz2
wget https://github.com/samtools/htslib/releases/download/1.10.2/htslib-1.10.2.tar.bz2
wget http://s3.amazonaws.com/plink1-assets/dev/plink_linux_x86_64.zip
bcftools merge • 6.0k views
ADD COMMENT
0
Entering edit mode

Please post as solution and accept as answer so we have a closure.

ADD REPLY
3
Entering edit mode
4.1 years ago

Not the same bcftools command, but you'll find the answer in this (my) blog post: bcftools concat: Failed to open variants.vcf.gz: could not load index. (Also one of the results if you google for "Failed to open vcf.gz: could not load index").

TLDR: you are opening too many files simultaneously.

ADD COMMENT
0
Entering edit mode

Is there any solution? merge 1020 each round and then do it again?

ADD REPLY
0
Entering edit mode

what about using plink to merge them?

ADD REPLY
0
Entering edit mode

Okay. Done, plink is okay.

ADD REPLY
3
Entering edit mode
4.1 years ago
Shicheng Guo ★ 9.4k

Okay. Here is my final solution developed based on WouterDeCoste's post. I hope it is helpful. One of my friends told me his computer allowed merging 7000 VCF at one time. I am not sure whether it is caused by a specific file operating setting.

ls *.vcf.gz | split -l 500 - subset_vcfs

for i in subset_vcfs*; 
do 
bcftools merge -0 -l $i -Oz -o merge.$i.vcf.gz; 
tabix -p vcf merge.$i.vcf.gz
done

ls merge.*.vcf.gz > merge.txt
bcftools merge -l merge.txt -0 -Oz -o all_merged.vcf.gz
bcftools annotate -x INFO,^FORMAT/GT all_merged.vcf.gz -Oz -o Final.vcf.gz
ADD COMMENT
0
Entering edit mode

What was the output of ulimit -u in your case?

ADD REPLY
0
Entering edit mode

ulimit -u = 4096

ADD REPLY
0
Entering edit mode

Thanks for genomax's tips. I find nofile - max number of open files in ulimit command

ADD REPLY

Login before adding your answer.

Traffic: 2411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6