Question: Performing fast bootstrap in R using ape package
0
gravatar for User000
20 months ago by
User000270
User000270 wrote:

Dear all,

I would like to perform a nj tree with 1000 bootstrap on my snp data. I have around 5K snps and I am using R package ape:

snp <- as.matrix(objt)
stree = nj(dist.gene(snp))
myBoots <- boot.phylo(stree, snp, function(xx) nj(dist.gene(xx)), B = 1000,  mc.cores = 6)

It has been 3 days so far it is still running, any suggestion how to make it faster, if it is possible at all.

ape bootstrap R • 2.0k views
ADD COMMENTlink modified 4 months ago by mgdias.jose10 • written 20 months ago by User000270

Are you sure that it is actually using the 6 cores that you specify? Is your parallel package loaded correctly?

Also, isn't 1000 bootstrap too much? 250x would be fine.

ADD REPLYlink written 20 months ago by Kevin Blighe53k

yeah, it says Running parallel bootstraps... and also is using 6 cores... Do you think it is enough for 5000 snps and something is going wrong?

ADD REPLYlink modified 20 months ago • written 20 months ago by User000270
1

Clustering is a data-intensive technique and doing it 1000 times for 5000 SNPs is going to take a long time, even with 6 cores.

Why not try it first with 6x bootstrap and 6 cores, and then see how long that takes (1 bootstrap per core). Then you will get an idea of timing.

I still believe that 1000x bootstrap is way too much.

ADD REPLYlink written 20 months ago by Kevin Blighe53k

I am running it also on a cluster with 10 cores (I don't know exactly how many cores I am allowed to use) and is still running also 3 days. Without bootstrap it takes me around 1-2 hours. Thank you a lot for the advice, I am now running it,let's see

ADD REPLYlink written 20 months ago by User000270
1

Okay, I think that you may have just answered your own question. If it takes even 1 hour to just run it once (on a single core), then 1000 bootstrap across 10 cores will take ~100 hours, or just over 4 days. Time is precious! Make the most of it.

ADD REPLYlink written 20 months ago by Kevin Blighe53k

How many samples you have? I do almost the same thing in Phangorn package with 13 samples and 7.5K of SNPs. So it took just 1-2 minutes for 2000 bootstrap replics.

ADD REPLYlink written 7 months ago by Denis140

Hello, in the follow-up of the bootstrapping, how can I then draw my actual tree?

Does the boot.phylo function update the NJ saved in stree? In other words, will boot.phylo generate bootstrap trees and update the consensus tree in the variable stree? If so, then I could apply ggtree on it. Is this correct?

Or does the boot.phylo function allow me only to label the previously generated NJ tree? If this is the case, is there any alternative to generate a bootstrap consensus tree to be plotted later?

Thanks

ADD REPLYlink modified 4 months ago • written 4 months ago by mgdias.jose10

Hey, you may consider opening a new question for this. User000 has not logged in for > 11 months.

ADD REPLYlink written 4 months ago by Kevin Blighe53k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1460 users visited in the last hour