Question: Speed of merging multiple bcfs with bcftools compared to PLINK?
0
gravatar for curious
10 weeks ago by
curious340
curious340 wrote:

I am trying to merge 3 sets of bcfs that contain hundreds of millions of sites for tens of thousands of samples. These bcfs have the exact same sites, just all different samples. using bcftools merge 1.bcf 2.bcf 3.bcf -Ob > merged.bcf looks like it is going to take days, maybe even weeks at the rate it is going. Even though they are both binary formats, would it be faster to first convert these each to plink then:

plink --make-bed --merge-list merge_list.txt --out merged

where merge_list.txt is a list of my binary plinks for each bcf:

1
2
3
plink bcftools • 118 views
ADD COMMENTlink written 10 weeks ago by curious340
2
gravatar for genomax
10 weeks ago by
genomax87k
United States
genomax87k wrote:

bcftools merge supports multiple threads. Looks like you are using just one.

--threads <int>                use multithreading with <int> worker threads [0]

Long as your storage subsystem is up to the task use multiple threads to speed things up.

Same with plink.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by genomax87k

Oh darn it. thank you!

ADD REPLYlink written 10 weeks ago by curious340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 684 users visited in the last hour