I've got a running GPU version of MrBayes utilizing the beagle library. There is a big difference in terms of computational time for the same input - the GPU run is completed in 89 hours while the CPU version takes 214 hours. However, there is a big difference in terms of the average standard deviation of split frequencies. When using the CPU version I get **0.005221**. For the GPU version when the run completes this value is **0.389552**. Naturally, I get a warning from MrBayes for the GPU run saying that:

run has not converged because the tree samples are very different (average standard deviation of split frequencies larger than 0.10

Any idea what can be the reason for such a big difference?

Below is the set of parameters I use for both MrBayes runs:

```
mcmc ngen=20000000 printfreq=1000 samplefreq=1000 nruns=2 nchains=4 temp=0.02;
sump burnin=0 nruns=2 printtofile=Yes outputname=sumpoutput.out plot=Yes marglike=Yes table=Yes;
sumt burnin=0 nruns=2 ntrees=1 displaygeq=0.05 contype=Halfcompat;
```

I execute both runs using mpirun to enable multiple cores.

In fact I do this but with a subsequent separate command using TreeAnnotator:

This is just a first example I gave. I have many more files to be processed, which take much much longer time than this, hence my decision to try out the GPU version.