Question

MrBayes - big difference in average standard deviation of split frequencies between CPU and GPU versions

1

Entering edit mode

17 months ago

wstfljs ▴ 100

I've got a running GPU version of MrBayes utilizing the beagle library. There is a big difference in terms of computational time for the same input - the GPU run is completed in 89 hours while the CPU version takes 214 hours. However, there is a big difference in terms of the average standard deviation of split frequencies. When using the CPU version I get 0.005221. For the GPU version when the run completes this value is 0.389552. Naturally, I get a warning from MrBayes for the GPU run saying that:

run has not converged because the tree samples are very different (average standard deviation of split frequencies larger than 0.10

Any idea what can be the reason for such a big difference?

Below is the set of parameters I use for both MrBayes runs:

mcmc ngen=20000000 printfreq=1000 samplefreq=1000 nruns=2 nchains=4 temp=0.02;
sump burnin=0 nruns=2 printtofile=Yes outputname=sumpoutput.out plot=Yes marglike=Yes table=Yes;
sumt burnin=0 nruns=2 ntrees=1 displaygeq=0.05 contype=Halfcompat;

I execute both runs using mpirun to enable multiple cores.

GPU phylogenetics MrBayes • 1.5k views

ADD COMMENT • link 17 months ago by wstfljs ▴ 100

score 2 · Answer 1 · 2022-11-25

I think the biggest reason is that you have no burnin in these analyses. My typical sump and sumt lines look like this:

sump relburnin=yes burnin=0.25 nruns=2;
sumt relburnin=yes burnin=0.25 nruns=2;

This means that a quarter of generations at the start are thrown out, which is how it should be. The exact number matters less (you can go with 0.1-0.25), but it is important to throw out the early part. That is because chains are very far from convergence at the beginning, and that propagates all the way through even when sampling for 20 million generations like you did.

Beyond that there are some other things I would try, although they are not as obviously wrong like your burnin settings. The temperature may be too low, allowing the sampling to get stuck in a local minimum. I would try values in the 0.05-0.1 range. Same for sampling frequency: doing it once in 1000 generations is not necessarily wrong, but you may want to try smaller numbers, in the 100-500 range. That way you may not need 20 million generations to achieve convergence (2-10 million may be enough). If you cut down the CPU run by half and still get the same result, that should be sufficient as 4 days doesn't strike me as too long. Even if nothing works better from the suggestions I made, you could still go with your CPU result since you already have it.