I would like to perform a nj tree with 1000 bootstrap on my snp data. I have around 5K snps and I am using R package ape:

snp <- as.matrix(objt)
stree = nj(dist.gene(snp))
myBoots <- boot.phylo(stree, snp, function(xx) nj(dist.gene(xx)), B = 1000,  mc.cores = 6)

It has been 3 days so far it is still running, any suggestion how to make it faster, if it is possible at all.

Are you sure that it is actually using the 6 cores that you specify? Is your parallel package loaded correctly?

Also, isn't 1000 bootstrap too much? 250x would be fine.

yeah, it says Running parallel bootstraps... and also is using 6 cores... Do you think it is enough for 5000 snps and something is going wrong?

Clustering is a data-intensive technique and doing it 1000 times for 5000 SNPs is going to take a long time, even with 6 cores.

Why not try it first with 6x bootstrap and 6 cores, and then see how long that takes (1 bootstrap per core). Then you will get an idea of timing.

I still believe that 1000x bootstrap is way too much.

I am running it also on a cluster with 10 cores (I don't know exactly how many cores I am allowed to use) and is still running also 3 days. Without bootstrap it takes me around 1-2 hours. Thank you a lot for the advice, I am now running it,let's see

Okay, I think that you may have just answered your own question. If it takes even 1 hour to just run it once (on a single core), then 1000 bootstrap across 10 cores will take ~100 hours, or just over 4 days. Time is precious! Make the most of it.

