Question: Fail job after all vs all : Error, 'unable to ReadProgram on SLURM cluster
gravatar for gusa_10
16 months ago by
gusa_100 wrote:

While running OMA 1.1.2 I got an error message while running my single job after my parallelize allvsall job - on slurm cluster:

Error, 'unable to ReadProgram(Cache/AllAll/id01/id04/part_3032-3127)

The single bash job is still running on the cluster but it has been almost two days already and no update in my job.out file.

Meanwhile, the job.err file shows:

rm: cannot remove ‘Cache/conversion.running’

Could you suggest me what to do? Should I let it run anyway?

Thank you

oma • 739 views
ADD COMMENTlink modified 15 months ago • written 16 months ago by gusa_100

This post is lacking sufficient detail. Please include information about the program being used, exact command line with options and the kind of analysis that is being done with type of data.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax62k

Tagging: adrian.altenhoff

ADD REPLYlink written 16 months ago by genomax62k
gravatar for adrian.altenhoff
16 months ago by
adrian.altenhoff450 wrote:

Hi Gusa

I'm one of the OMA maintainers. The version you are using is already a bit out-dated. The way the parallel processing of jobs has since been improved quite a bit. If possible, I suggest to upgrade OMA to the latest version.

Most likely the referred chunk is corrupted, something that could happen on slow filesystems on older versions of OMA. Best is to abort the run, remove this chunk and restart the job. It should only need to redo this single chunk, so should not take long and then it should continue with the inference of the orthologs.

About the conversion.running problem: I assume that this file has already been removed by another job. if not, remove it prior to restat oma.

Good luck with the run! Best wishes Adrian

ADD COMMENTlink written 16 months ago by adrian.altenhoff450

Hi Adrian,

Thank you very much. I am encountering new problems with the latest version of OMA. After getting several message of this type on my slrum job.out (except for the last line):

You specified to stop after the database conversion step (i.e. you set the "-c" flag). Database conversion successfully finished.

I got an error message on the job.err:

OMA.2.1.1/bin/../darwinlib/../data/GOdata.drw-20171023: 76.7% -- replaced wit ../data/GOdata.drw-20171023

While the last line of the job.out says:

: waiting for too long. abort. It seems that your parallelisation ...

I started my job with the options: ..oma -n 20 -c

Any suggestions?

Thank you so much in advance!

ADD REPLYlink modified 15 months ago • written 15 months ago by gusa_100

Problem solved! I just need more memory

ADD REPLYlink written 15 months ago by gusa_100

wrong thread, sorry,

ADD REPLYlink modified 14 months ago • written 14 months ago by andrespara0
gravatar for gusa_10
15 months ago by
gusa_100 wrote:

Hi Adrian,

I re-run the analysis with OMA 2.1.1 and still have the same error message "Error, 'unable to ReadProgram(Cache/AllAll/sp1/sp2/part_1042-1106)" Should keep deleting these corrupted files and re-run again?

Thank you!

ADD COMMENTlink written 15 months ago by gusa_100

yes. might be useful to check why they are failing in the scheduler's log (e.g. too little memory allocated to the process, or too little runtime reserved?). Cheers Adrian

ADD REPLYlink written 15 months ago by adrian.altenhoff450
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1396 users visited in the last hour