OMA Phase 2 stuck in "reading all vs. all" stage for 24 hours
1
0
Entering edit mode
3.1 years ago
eschang1 ▴ 10

Hi there,

I have just completed the all vs. all phase of the OMA standalone program with 42 metazoan genomes. I have tried to initiate the actual orthogroup inference phase but the program keeps getting stuck indefinitely on reading the all-vs. all results.

I have double checked that each pair of species has actual all vs. all results, and received the completion message:

*** All all-vs-all jobs successfully terminated.     ***
*** terminating after AllAll phase due to "-s" flag. ***
*** if you see this message at the end of one job,   ***
*** this means that all jobs successfully finished.  ***


As suggested, I am running the next stage as a single thread with a lot of memory (trying 160GB of RAM right now), and pretty much default parameters, but it still hangs. This most recent time I turned on a high level of debugging (-d 5) to see what was going on and it specifically gets stuck here:

{--> enter Counter, args = Number of matches discarded below MinSeqLen

<-- exit Counter = Counter(Number of matches discarded below MinSeqLen,0)}


Googling this error just brings up a lot of generic Python results.

If anyone has any insight on this error and/or whether or not this step really does just take a long time would be much appreciated. As background, running this step with a small test dataset of three organisms worked just fine.

Sally Chang

oma orthology orthologs • 1.1k views
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (text becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.

0
Entering edit mode

Thank you for the tip!

0
Entering edit mode

2
Entering edit mode
3.1 years ago

Hi Sally,

I'm one of the developers of OMA standalone. Couldn't it be just a buffering issue of the stdout and stderr streams? Certain HPC clusters keep a large buffer before data is written to disk, and during the phase of reading these files we don't produce a lot of output... Once the process gets terminated, the buffers should be flushed though. So in case the jobs are still running, you could kill them and check if there is indeed no further output.

Otherwise, if they get fully blocked, I would need to have a way to reproduce the problem. Could you make the dataset available to me?

0
Entering edit mode

I let my most recent OMA run just get killed due to the walltime limit (24 hours) to see if the buffers would indeed be flushed, and it looks like it died while doing a bunch of additivity checks, i.e.

VP check additivity: daphnia_pulex/04002 vs hydra_magnipapillata/19500 by orbicella_faveolata/(00269,00273): 10.725044>2.

I don't see any further intermediate output files, but I am not totally sure if I should even expect to see any during this phase of the algorithm. If not, then I probably just need to give OMA a nice long walltime limit. If I am supposed to be seeing output file by this stage, please let me know what exactly which files you need to troubleshoot (i.e. the original proteomes or the All vs All folders?).

Thanks so much for you help so far!

Cheers, Sally

0
Entering edit mode

Hi Sally,

what you see here happens indeed much later than loading the AllAll files, so it was indeed a buffering problem. You should indeed increase the walltime limit to something bigger and your computation will hopefully nicely run through.

Best wishes

0
Entering edit mode

Hi Adrian, I am trying to run OMA (version OMA.2.4.2) on cluster using SLURM. I basically followed the procedure described in Oma Cheat Sheet, and in my DB directory I collected genomes for 20 species (coming from OMA using the export option) plus the genome of my species of interest. However, the array-job (splited into 1000 parallelized jobs) seems to take a lot of time (each job runs for over 8 hours!) Is there any way to speed up this step?

Thank you, Ilenia

0
Entering edit mode

Hi Ilenia, 21 species should definitively work out for OMA. Depending on the size of the genomes, the 8k hours might not be too far off. if you have a hard time to get the jobs scheduled in the cluster because of the 8hrs, you should consider to requeue jobs that take too long by setting an timelimit to the OMA jobs and checking on the special exit code 99, i.e. you could modify your slurm submission script to do something like this:

#!/bin/bash
#SBATCH --array=1-1000
#SBATCH --time=2:00:00
#SBATCH --nodes=1
#SBATCH --mem=2GB
#SBATCH --job-name=oma

export NR_PROCESSES=1000

bin/oma -s -W 7000
if [[ "$?" == "99" ]] ; then scontrol requeue${SLURM_ARRAY_JOB_ID}_\${SLURM_ARRAY_TASK_ID}
fi
exit 0