Question: Is Galaxy Ok For Speed?
0
gravatar for corn8bit
6.0 years ago by
corn8bit140
corn8bit140 wrote:

I'd like to use Galaxy for my cluster pipelines. It should make it easier for less tech savy team members to run pipelines.

It looks like Galaxy starts ALL processes inside of a python wrapper (when running top I see python instead of bwa). Will this be a speed issue? Speed is important for me, and I need to use all (many) threads effectively.

Why oh why does Galaxy start things in python wrappers? Will this hurt my speed?

Additional data:

I'm currently doing tests myself and have searched for this question. I apologize if I missed the answer. I also know that Galaxy duplicates intermediate data, but HDD reads aren't a bottleneck for me so this is no problem and I'll automate the deletions later. This question is CPU targeted.

galaxy • 1.9k views
ADD COMMENTlink modified 6.0 years ago by Dan D6.8k • written 6.0 years ago by corn8bit140
4
gravatar for Björn
6.0 years ago by
Björn650
Germany
Björn650 wrote:

Hi,

Galaxy is not running everything in python wrappers. Most of the wrappers are bash-like scripts. However, a few of them are, but this is not a speed limitation. All what these wrappers are doing is abstracting the inputs and outputs (tempfiles etc.). In that case the program is usually invoked through subprocess, so there are no speed issues. Btw. deletion of intermediate data can also be handeled by galaxy and you do not need to care about it.

Hope that helps,

Bjoern

ADD COMMENTlink written 6.0 years ago by Björn650

Thanks, that's good to know and saves me a lot of time. I'm glad that this is the case. It makes much more sense! What confused me as well is the "load balancing" documentation that also makes it sound like an issue. They must be talking about for 100+ users at a time.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by corn8bit140
2

Correct. The Galaxy application itself is subject to the Python Global Interpreter Lock. You can bypass this by specifying multiple instances. It's a little tricky but definitely doable, and you definitely won't need to do it until you regularly have multiple simultaneous users.

ADD REPLYlink written 6.0 years ago by Dan D6.8k
3
gravatar for Charles Warden
6.0 years ago by
Charles Warden6.8k
Duarte, CA
Charles Warden6.8k wrote:

The main Galaxy server gets a lot of use. So, I would consider it slow due to the number users.

This is why some institutions set up their own galaxy mirror (where user access can be limited, decreasing the total number of users). If you had a local mirror, you could benchmark NGS tasks and definitely see a difference. I wouldn't consider speed a problem for a mirror installation.

ADD COMMENTlink written 6.0 years ago by Charles Warden6.8k
3
gravatar for Dan D
6.0 years ago by
Dan D6.8k
Tennessee
Dan D6.8k wrote:

I've deployed a local installation of Galaxy on a cluster. If you examine the Python wrappers carefully, you'll see that they're constructing and then executing a command line. Thus the tools they're wrapping are not subject to the Python Global Interpreter Lock. Galaxy won't run tools any slower than they would run on a pure command-line execution if you're submitting jobs to a cluster.

Galaxy also includes scripts to automatically delete datasets according to parameters you specify. More information here.

ADD COMMENTlink written 6.0 years ago by Dan D6.8k

Thanks! Your link was very helpful.

ADD REPLYlink written 6.0 years ago by corn8bit140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1075 users visited in the last hour