Question

Forum:Slurm, Son of Grid Engine, Mesos, and Hadoop YARN vs HTCondor and Torque

3

Entering edit mode

6.0 years ago

Shicheng Guo ★ 9.5k

Hi All,

Anyone have any idea to compare these high-throughput computing framework? Which one is the best to choose for current high-throughput computing frame( check TACC vs SDSC vs HTcondor).

Thanks

PBS HPC SDSC Torque • 6.7k views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 6.0 years ago by Shicheng Guo ★ 9.5k

score 4 · Answer 1 · 2018-08-02

They're mostly the same at the end of the day, it's more a question of (1) choosing something that will still be supported in 5-10 years (the various SGEs keep losing support) and (2) finding someone locally willing to administer it. We switched from one of the umpteen SGE variants to Slurm a few years ago and are pretty happy. It's still getting regular updates and is widely used, so it's not going anywhere. The same can be said from Torque and LSF. HTCondor is a bit different, since most people would only use that if they need some of its more specialized features (e.g., moving datasets to nodes that lack shared filesystems (you can probably do this with other resource managers, I've never checked) or scavaging resources from unused computers).

BTW, regarding hadoop yarn, I imagine that'd be most useful if your cluster used hadoop. There are vanishingly few bioinformatics applications that natively support hadoop, so I'm not really sure you'd end up gaining anything except headaches.

score 1 · Answer 2 · 2018-08-02

The "best" will probably depend on the size and architecture of your computing grid, and on the technical staff at hand to manage it. The only one I have hands-on administration experience is Torque+Maui, which is relatively simple, but I would only recommend for small clusters.

This Wikipedia link has some information, but it is very incomplete:

https://en.wikipedia.org/wiki/Comparison_of_cluster_software

edit: I just discovered Torque is no longer open source (it had a restrictive license, which some considered non-free):

http://www.adaptivecomputing.com/products/torque/

Note: As of June 2018, Adaptive Computing is offering Torque and Torque Support for purchase. For more information, please fill out the request form and we will respond as soon as possible.

score 1 · Answer 3 · 2018-08-02

Matt Maurano wrote some wrappers to submit SGE scripts through a Slurm scheduler: https://github.com/mauranolab/sge2slurm

One feature that Slurm offers that I can't recall if our older SGE setup offered is the ability to submit arrays of jobs, which is useful for simulation or permutation tests (Monte Carlo etc.).

We've had some pains with Slurm, mainly due to configuring the fair-share priority mechanism and some other parameters that made less effective use of the cluster than desired, when a lot of jobs are thrown at it.

We also have some job submission timeout issues that it is hard to find out much about online. This seems to be a not-uncommon problem with Slurm deployments. No one seems to know what the problem is.

I'd definitely suggest looking into an ironclad support contract of some kind, regardless of what scheduler you go with. Also, put the cluster through its paces with various levels of load testing from the start, to figure out what needs tweaking for your setup.

score 1 · Answer 4 · 2018-08-02

1

Entering edit mode

6.0 years ago

GenoMax 144k

Best in terms of features is probably LSF (which is not cheap and probably the reason it is not on your list).

ADD COMMENT • link 6.0 years ago by GenoMax 144k