!/bin/bash

Question

mrbayes segmentation fault

0

Entering edit mode

19 months ago

dgrace999 • 0

Hello, I have tried running mrbayes-v3.2.6 on my university cluster with a text block in my nexus file, as below:

begin mrbayes; set autoclose=yes nowarn=yes; CHARSET mtgenome = 1-16701; partition favored = 1: mtgenome;

set partition = favored; unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all) tratio=(all); prset applyto=(all) ratepr=variable; lset applyto=(1) nst=2 rates=invgamma; mcmc nruns=2 ngen=10000000 samplefreq=1000 printfreq=1000 nchains=4 savebrlens=yes; mcmc; sump burnin=2500; sumt burnin=2500; END;

I call the operation using a .sh script here:

!/bin/bash

PBS -V

PBS -N mrbayes_mito

PBS -q batch

PBS -S /bin/bash

PBS -l select=1:ncpus=16

PBS -l walltime=720:00:00

cd $PBS_O_WORKDIR module load beagle-2.1.2 mrbayes-3.2.6

mb /nas1/dlema/pardus_africa_outgroup_mt_alignment.nexus

The operation aborts with this error log:

var/spool/pbs/mom_priv/jobs/1065581.huxley-head.SC: line 11: 18480 Segmentation fault mb ./pardus_africa_outgroup_mt_alignment.nexus

And the output log says this (after successfully running several dozen "Average standard deviation of split frequencies"):

Average standard deviation of split frequencies: 0.227872 46000 -- (-42285.050) [-42185.491] (-42276.483) (-42301.598) (-42279.110) (-42208.973) (-42276.280) [-42212.730] (...0 remote chains...) -- 640:24:45 47000 -- (-42280.792) [-42176.176] (-42271.630) (-42289.592) (-42299.672) (-42210.341) (-42284.872) [-42206.358] (...0 remote chains...) -- 640:21:17 48000 -- (-42293.219) [-42189.734] (-42261.749) (-42283.816) (-42299.638) (-42191.706) (-42284.769) [-42202.821] (...0 remote chains...) -- 640:17:56 49000 -- (-42293.278) [-42195.849] (-42280.603) (-42272.094) (-42293.363) (-42205.396) (-42294.548) [-42209.644] (...0 remote chains...) -- 640:31:41 50000 -- (-42285.433) [-42176.144] (-42282.690) (-42276.438) * (-42287.526) (-42208.512) (-42289.159) [-42200.776] (...0 remote chains...) -- 640:11:33

Could not remove partition 87 in RemoveTreeFromPartitionCounters ......................................................[huxley-n0001:18480] Process received signal [huxley-n0001:18480] Signal: Segmentation fault (11) [huxley-n0001:18480] Signal code: Address not mapped (1) [huxley-n0001:18480] Failing at address: 0x3f15a000 [huxley-n0001:18480] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b923ee576d0] [huxley-n0001:18480] [ 1] /lib64/libc.so.6(_IO_vfprintf+0x4a79)[0x2b923f0b0f19] [huxley-n0001:18480] [ 2] mb[0x554541] [huxley-n0001:18480] [ 3] mb[0x4d26b8] [huxley-n0001:18480] [ 4] mb[0x4a3a1b] [huxley-n0001:18480] [ 5] mb[0x42ce4f] [huxley-n0001:18480] [ 6] mb[0x40d80b] [huxley-n0001:18480] [ 7] mb[0x42ce4f] [huxley-n0001:18480] [ 8] mb[0x402bdb] [huxley-n0001:18480] [ 9] mb[0x402883] [huxley-n0001:18480] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b923f086445] [huxley-n0001:18480] [11] mb[0x4026a9] [huxley-n0001:18480] End of error message ***

Forgive me, but I don't know what this means, or what the workaround might be. I am fairly inexperienced. Please help!

Regards

mrbayes segmentation fault • 924 views

ADD COMMENT • link 19 months ago by dgrace999 • 0

0

Entering edit mode

I don't know what exactly the error message means. What I can tell you is that a burn-in of 2500 generations is absolutely inadequate. That number is typically 10-25% of the total number of generations. The purpose of a burn-in is to allow the chains to converge to similar trajectories, and that can't happen in 2500 generations except maybe for alignments with a handful of very short sequences. Given that you are sampling for 10 million generations, a burn-in of 2500 is nowhere near the prescribed ballpark of 10-25% of total generations.

In most of my scripts the two lines that define burn-in are something like this:

sump relburnin=yes burnin=0.25;
sumt relburnin=yes burnin=0.25;

This means the burn-in is defined as a relative number on a [0, 1] scale, and that first 25% of sampling generations will be thrown out.

ADD REPLY • link 19 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Hello,

Thank you. An explanation of where I came up with 2500 is 10 million divided by 1000 (because sample freq is 1000), divided again by 4 because there are 4 chains running. This is what had been recommended to me. But I think I prefer to use the relburnin for the future. Thank you for the tip!!

ADD REPLY • link 19 months ago by dgrace999 • 0