Speed of merging multiple bcfs with bcftools compared to PLINK?
1
0
Entering edit mode
11 months ago
curious ▴ 530

I am trying to merge 3 sets of bcfs that contain hundreds of millions of sites for tens of thousands of samples. These bcfs have the exact same sites, just all different samples. using bcftools merge 1.bcf 2.bcf 3.bcf -Ob > merged.bcf looks like it is going to take days, maybe even weeks at the rate it is going. Even though they are both binary formats, would it be faster to first convert these each to plink then:

plink --make-bed --merge-list merge_list.txt --out merged

where merge_list.txt is a list of my binary plinks for each bcf:

1
2
3
bcftools plink • 404 views
ADD COMMENT
3
Entering edit mode
11 months ago
GenoMax 101k

bcftools merge supports multiple threads. Looks like you are using just one.

--threads <int>                use multithreading with <int> worker threads [0]

Long as your storage subsystem is up to the task use multiple threads to speed things up.

Same with plink.

ADD COMMENT
0
Entering edit mode

Oh darn it. thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6