Sge,Torque, Pbs : What'S The Best Choise For A Ngs Dedicated Cluster ?
6
12
Entering edit mode
9.6 years ago
abihouee ▴ 120

Sorry, it may be off topics...

We plan to install a scheduler on our cluster (DELL blade cluster over Infiniband storage on Linux CentOS 6.3). This cluster is dedicated to do NGS data analysis.

It seems to me that the most current is SGE, but since Oracle bougth the stuff, there are several alternative developments (Open Grid Engine, Sun Grid Engine, Univa Grid Engine ...)

An other possible scheduler is Torque/PBS.

I'm a little bit lost in this scheduler forest!

Is there someone with any experiment on this or who knows some existing benchmark?

Thanks a lot.

Audrey

clustering next-gen analysis • 26k views
2
Entering edit mode

I worked with SGE for years at a genome center in Vancouver. Seemed to work quite well. Now I'm at a different genome center and we are using LSF but considering switching to SGE, which is ironic because we are trying to transition from Oracle DB to PostGres to get away from Oracle... SGE and LSF seemed to offer similar functionality and performance as far as I can tell. Both clusters have several 1000 cpus.

1
Entering edit mode

openlava (source code) is an open-source fork of LSF that while lacking some features does work fairly well.

1
Entering edit mode

Torque is fine, and very well tested; either of the SGE forks are widely used in this sort of environment, and has qmake, which some people are very fond of. SLURM is another good possibility.

10
Entering edit mode
9.6 years ago
matted 7.6k

I can only offer my personal experiences, with the caveat that we didn't do a ton of testing and so others may have differing opinions.

We use SGE, which installs relatively nicely on Ubuntu with the standard package manager (the gridengine-* packages). I'm not sure what the situation is on CentOS.

We previously used Torque/PBS, but the scheduler performance seemed poor and it bogged down with lots of jobs in the queue. When we switched to SGE, we didn't have any problems. This might be a configuration error on our part, though.

When I last tried out Condor (several years ago), installation was quite painful and I gave up. I believe it claims to work in a cross-platform environment, which might be interesting if for example you want to send jobs to Windows workstations.

LSF is another option, but I believe the licenses cost a lot.

My overall impression is that once you get a system running in your environment, they're mostly interchangeable (once you adapt your submission scripts a bit). The ease with which you can set them up does vary, however. If your situation calls for "advanced" usage (MPI integration, Kerberos authentication, strange network storage, job checkpointing, programmatic job submission with DRMAA, etc. etc.), you should check to see which packages seem to support your world the best.

1
Entering edit mode

Recent versions of torque have improved a great deal for large numbers of jobs, but yes, that was a real problem.

I also agree that all are more or less fine once they're up and working, and the main way to decide which to use would be to either (a) just pick something future users are familiar with, or (b) pick some very specific things you want to be able to accomplish with the resource manager/scheduler and start finding out which best support those features/workflows.

4
Entering edit mode
9.6 years ago

Unlike PBS, SGE has qrsh, which is a command that actually run jobs in the foreground, allowing you to easily inform a script when a job is done. What will they think of next?

This is one area where I think the support you pay for going commercial might be worthwhile. At least you'll have someone to field your complaints.

2
Entering edit mode

EDIT: Some versions of PBS also have qsub -W block=true that works in a very similar way to SGE qsrh.

0
Entering edit mode

You must have a newer version than me

qsub -W block=true dothis.sh
qsub: Undefined attribute  MSG=detected presence of an unknown attribute
qsub --version
version: 2.4.11

0
Entering edit mode

For Torque and perhaps versions of PBS without -W block=true, you can use the following to switches. The behaviour is similar but when called, any embedded options to qsub will be ignored. Also, stderr/stdout is sent to the shell.

qsub -I -x dothis.sh

1
Entering edit mode

My answer should be updated to say that any DRMAA-compatible cluster engine is fine, though running jobs through DRMAA (e.g. Snakemake --drmaa) instead of with a batch scheduler may anger your sysadmin, especially if they are not familiar with scientific computing standards.

Using qsub -I just to get a exit code is not ok

0
Entering edit mode

Torque definitely allows interactive jobs -

qsub -I


As for Condor, I've never seen it used within a cluster; it was designed back in the day for farming out jobs between diverse resources (e.g., workstations after hours) and would have a lot of overhead for working within a homogeneous cluster. Scheduling jobs between clusters, maybe?

4
Entering edit mode
9.6 years ago

We use Rocks Cluster Distribution that comes with SGE.

http://en.wikipedia.org/wiki/Rocks_Cluster_Distribution

1
Entering edit mode

+1 Rocks - If you're setting up a dedicated cluster, it will save you a lot of time and pain.

0
Entering edit mode

I'm not a huge rocks fan personally, but one huge advantage, especially (but not only) if you have researchers who use XSEDE compute resources in the US, is that you can use the XSEDE campus bridging rocks rolls which bundle up a large number of relevant software packages as well as the cluster management stuff. That also means that you can directly use XSEDEs extensive training materials to help get the cluster's new users up to speed.

3
Entering edit mode
9.6 years ago
samsara ▴ 630

It has been more than a year I have been using SGE for processing NGS data. I have not experienced any problem with it. I am happy with it. I have not used any other scheduler except Slurm few times.

2
Entering edit mode
7.4 years ago

Used SGE at my old institute, currently using PBS and I really wish we had SGE on the new cluster. Things I miss the most, qmake and the -sync y qsub option. These two were completely pipeline savers. I also appreciated the integration of MPI with SGE. Not sure how well it works with PBS as we currently don't have it installed.

1
Entering edit mode
7.4 years ago
pld 5.0k

NIH's Biowulf system uses PBS, but most of my gripes about PBS are more about the typical user load. PBS always looks for the next smallest job, so your 30 node run that will take an hour can get stuck behind hundreds (and thousands) of single node jobs that take a few hours each. Other than that it seems to work well enough.

In my undergrad our cluster (UMBC Tara) uses SLURM, didn't have as many problems there but usage there was different, more nodes per user (82 nodes with ~100 users) and more MPI/etc based jobs. However, a grad student in my old lab did manage to crash the head nodes because we were rushing to rerun a ton of jobs two days before a conference. I think it was likely a result of the head node hardware and not SLURM. Made for a few good laughs.

2
Entering edit mode

PBS always looks for the next smallest job

Just so people know, that's not something inherent to PBS. That's a configurable choice the scheduler (probably maui in this case) makes, but you can easily configure the scheduler so that bigger jobs so that they don't get starved out by little jobs that get "backfilled" into temporarily open slots.

0
Entering edit mode

Part of it is because Biowulf looks for the next smallest job but also prioritizes by how much cpu time a user has been consuming. If I've run 5 jobs with 30x 24 core nodes each taking 2 hours of wall time, I've used roughly 3600 CPU hours. If someone is using a single core on each node (simple because of memory requirements), they're basically at a 1:1 ratio between wall and cpu time. It will take a while for their CPU hours to catch up to mine.

It is a pain, but unlike math/physics/etc there are fewer programs in bioinformatics that make use of message passing (and when they do, they don't always need low-latency ICs), so it makes more sense to have PBS work for the generic case. This behavior is mostly seen on the ethernet IC nodes, there's a much smaller (245 nodes) system set up with infiniband for jobs that really need it (e.g. MrBayes, structural stuff).

Still I wish they'd try and strike a better balance. I'm guilty of it but it stinks when the queue gets clogged with memory intensive python/perl/R scripts that probably wouldn't need so much memory if they were written in C/C++/etc.