Forum: Slurm, Son of Grid Engine, Mesos, and Hadoop YARN vs HTCondor and Torque
gravatar for Shicheng Guo
18 months ago by
Shicheng Guo8.0k
Shicheng Guo8.0k wrote:

Hi All,

Anyone have any idea to compare these high-throughput computing framework? Which one is the best to choose for current high-throughput computing frame( check TACC vs SDSC vs HTcondor).


sdsc pbs torque forum hpc • 2.4k views
ADD COMMENTlink modified 18 months ago by Devon Ryan93k • written 18 months ago by Shicheng Guo8.0k
gravatar for Devon Ryan
18 months ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

They're mostly the same at the end of the day, it's more a question of (1) choosing something that will still be supported in 5-10 years (the various SGEs keep losing support) and (2) finding someone locally willing to administer it. We switched from one of the umpteen SGE variants to Slurm a few years ago and are pretty happy. It's still getting regular updates and is widely used, so it's not going anywhere. The same can be said from Torque and LSF. HTCondor is a bit different, since most people would only use that if they need some of its more specialized features (e.g., moving datasets to nodes that lack shared filesystems (you can probably do this with other resource managers, I've never checked) or scavaging resources from unused computers).

BTW, regarding hadoop yarn, I imagine that'd be most useful if your cluster used hadoop. There are vanishingly few bioinformatics applications that natively support hadoop, so I'm not really sure you'd end up gaining anything except headaches.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Devon Ryan93k
gravatar for h.mon
18 months ago by
h.mon29k wrote:

The "best" will probably depend on the size and architecture of your computing grid, and on the technical staff at hand to manage it. The only one I have hands-on administration experience is Torque+Maui, which is relatively simple, but I would only recommend for small clusters.

This Wikipedia link has some information, but it is very incomplete:

edit: I just discovered Torque is no longer open source (it had a restrictive license, which some considered non-free):

Note: As of June 2018, Adaptive Computing is offering Torque and Torque Support for purchase. For more information, please fill out the request form and we will respond as soon as possible.

ADD COMMENTlink modified 17 months ago • written 18 months ago by h.mon29k
gravatar for Alex Reynolds
18 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Matt Maurano wrote some wrappers to submit SGE scripts through a Slurm scheduler:

One feature that Slurm offers that I can't recall if our older SGE setup offered is the ability to submit arrays of jobs, which is useful for simulation or permutation tests (Monte Carlo etc.).

We've had some pains with Slurm, mainly due to configuring the fair-share priority mechanism and some other parameters that made less effective use of the cluster than desired, when a lot of jobs are thrown at it.

We also have some job submission timeout issues that it is hard to find out much about online. This seems to be a not-uncommon problem with Slurm deployments. No one seems to know what the problem is.

I'd definitely suggest looking into an ironclad support contract of some kind, regardless of what scheduler you go with. Also, put the cluster through its paces with various levels of load testing from the start, to figure out what needs tweaking for your setup.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Alex Reynolds29k

SGE does have job arrays ( ), as does Torque+Maui. But I've seen one Torque+Maui system with instability when large arrays are submitted, causing Maui crashes, and found some posts about the subject as well, so the problem is not uncommon.

ADD REPLYlink written 18 months ago by h.mon29k
gravatar for genomax
18 months ago by
United States
genomax76k wrote:

Best in terms of features is probably LSF (which is not cheap and probably the reason it is not on your list).

ADD COMMENTlink modified 18 months ago • written 18 months ago by genomax76k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 983 users visited in the last hour