mrbayes segmentation fault
0
0
Entering edit mode
13 months ago
dgrace999 • 0

Hello, I have tried running mrbayes-v3.2.6 on my university cluster with a text block in my nexus file, as below:

begin mrbayes; set autoclose=yes nowarn=yes; CHARSET mtgenome = 1-16701; partition favored = 1: mtgenome;

set partition = favored; unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all) tratio=(all); prset applyto=(all) ratepr=variable; lset applyto=(1) nst=2 rates=invgamma; mcmc nruns=2 ngen=10000000 samplefreq=1000 printfreq=1000 nchains=4 savebrlens=yes; mcmc; sump burnin=2500; sumt burnin=2500; END;

I call the operation using a .sh script here:

!/bin/bash

PBS -V

PBS -N mrbayes_mito

PBS -q batch

PBS -S /bin/bash

PBS -l select=1:ncpus=16

PBS -l walltime=720:00:00

cd $PBS_O_WORKDIR module load beagle-2.1.2 mrbayes-3.2.6

mb /nas1/dlema/pardus_africa_outgroup_mt_alignment.nexus

The operation aborts with this error log:

var/spool/pbs/mom_priv/jobs/1065581.huxley-head.SC: line 11: 18480 Segmentation fault mb ./pardus_africa_outgroup_mt_alignment.nexus

And the output log says this (after successfully running several dozen "Average standard deviation of split frequencies"):

Average standard deviation of split frequencies: 0.227872 46000 -- (-42285.050) [-42185.491] (-42276.483) (-42301.598) (-42279.110) (-42208.973) (-42276.280) [-42212.730] (...0 remote chains...) -- 640:24:45 47000 -- (-42280.792) [-42176.176] (-42271.630) (-42289.592) (-42299.672) (-42210.341) (-42284.872) [-42206.358] (...0 remote chains...) -- 640:21:17 48000 -- (-42293.219) [-42189.734] (-42261.749) (-42283.816) (-42299.638) (-42191.706) (-42284.769) [-42202.821] (...0 remote chains...) -- 640:17:56 49000 -- (-42293.278) [-42195.849] (-42280.603) (-42272.094) (-42293.363) (-42205.396) (-42294.548) [-42209.644] (...0 remote chains...) -- 640:31:41 50000 -- (-42285.433) [-42176.144] (-42282.690) (-42276.438) * (-42287.526) (-42208.512) (-42289.159) [-42200.776] (...0 remote chains...) -- 640:11:33

Could not remove partition 87 in RemoveTreeFromPartitionCounters ......................................................[huxley-n0001:18480] Process received signal [huxley-n0001:18480] Signal: Segmentation fault (11) [huxley-n0001:18480] Signal code: Address not mapped (1) [huxley-n0001:18480] Failing at address: 0x3f15a000 [huxley-n0001:18480] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b923ee576d0] [huxley-n0001:18480] [ 1] /lib64/libc.so.6(_IO_vfprintf+0x4a79)[0x2b923f0b0f19] [huxley-n0001:18480] [ 2] mb[0x554541] [huxley-n0001:18480] [ 3] mb[0x4d26b8] [huxley-n0001:18480] [ 4] mb[0x4a3a1b] [huxley-n0001:18480] [ 5] mb[0x42ce4f] [huxley-n0001:18480] [ 6] mb[0x40d80b] [huxley-n0001:18480] [ 7] mb[0x42ce4f] [huxley-n0001:18480] [ 8] mb[0x402bdb] [huxley-n0001:18480] [ 9] mb[0x402883] [huxley-n0001:18480] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b923f086445] [huxley-n0001:18480] [11] mb[0x4026a9] [huxley-n0001:18480] End of error message ***

Forgive me, but I don't know what this means, or what the workaround might be. I am fairly inexperienced. Please help!

Regards

mrbayes segmentation fault • 762 views
ADD COMMENT
0
Entering edit mode

I don't know what exactly the error message means. What I can tell you is that a burn-in of 2500 generations is absolutely inadequate. That number is typically 10-25% of the total number of generations. The purpose of a burn-in is to allow the chains to converge to similar trajectories, and that can't happen in 2500 generations except maybe for alignments with a handful of very short sequences. Given that you are sampling for 10 million generations, a burn-in of 2500 is nowhere near the prescribed ballpark of 10-25% of total generations.

In most of my scripts the two lines that define burn-in are something like this:

sump relburnin=yes burnin=0.25;
sumt relburnin=yes burnin=0.25;

This means the burn-in is defined as a relative number on a [0, 1] scale, and that first 25% of sampling generations will be thrown out.

ADD REPLY
0
Entering edit mode

Hello,

Thank you. An explanation of where I came up with 2500 is 10 million divided by 1000 (because sample freq is 1000), divided again by 4 because there are 4 chains running. This is what had been recommended to me. But I think I prefer to use the relburnin for the future. Thank you for the tip!!

ADD REPLY

Login before adding your answer.

Traffic: 2892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6