Question: OMA standalone estimating memory use for bacterial genomes
gravatar for Andrew Watson
4 weeks ago by
United Kingdom
Andrew Watson40 wrote:

I am trying to estimate memory requirements to run the final steps OMA standalone (after the ALL-vs ALL) on a HPC (slurm), without HOG inference.

From the benchmarks in the manual, the suggested formula is 400MB * pow(nr_genomes, 1.4).

This works for the described metazoan dataset (60 metazoas have been successfully computed using 120GB).

The requirements for bacterial genomes are reported to be lower (50GB for 60 genomes).

I adjusted the formula to match that, so it would be ~166MB * pow(nr_genomes, 1.4).

Does this match other peoples experiences working with bacterial or archaeal genomes?

Would leaving out the HOG inference help me to reduce those requirements significantly?

I was hoping to use a dataset of ~400 genomes but may have to rethink that if it will need around 730Gb of memory.

oma orthologs • 105 views
ADD COMMENTlink modified 4 weeks ago by adrian.altenhoff700 • written 4 weeks ago by Andrew Watson40
gravatar for adrian.altenhoff
4 weeks ago by
adrian.altenhoff700 wrote:

Hi Andrew,

I don't think that the scaling behavior of OmaStandalone is the same for bacterial genomes than for eukaryotes. The formula you use gives a rough idea from a few datapoints and are also rather conservative. The memory consumption depends a lot on the size of the genomes and how related they are to each other, i.e. the number of homologs and orthologs.

I'm very confident that you could run 400 bacteria with less than 100GB memory. Would be nice if you can post the amount of memory it required in the end once you're done with the computations.

Deactivating the HOG computation will not significantly reduce the amount of memory (at least not for the bottom-up variant of the algorithm).

Best wishes Adrian

ADD COMMENTlink written 4 weeks ago by adrian.altenhoff700

Hi Adrian,

Great, thanks for the information. I had thought/hoped that they might scale differently.

I was mainly checking to get an idea of a starting point to discuss with the HPC admins. A single core with high memory and for a longer than usual wall-time doesn't fit neatly into any of their standard queues.

I'll request 100GB as a jumping-off point and let you know how I get on.

Best wishes, Andrew

ADD REPLYlink written 4 weeks ago by Andrew Watson40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1841 users visited in the last hour