Is Galaxy Ok For Speed?
3
0
Entering edit mode
10.8 years ago
corn8bit ▴ 140

I'd like to use Galaxy for my cluster pipelines. It should make it easier for less tech savy team members to run pipelines.

It looks like Galaxy starts ALL processes inside of a python wrapper (when running top I see python instead of bwa). Will this be a speed issue? Speed is important for me, and I need to use all (many) threads effectively.

Why oh why does Galaxy start things in python wrappers? Will this hurt my speed?

Additional data:

I'm currently doing tests myself and have searched for this question. I apologize if I missed the answer. I also know that Galaxy duplicates intermediate data, but HDD reads aren't a bottleneck for me so this is no problem and I'll automate the deletions later. This question is CPU targeted.

galaxy • 2.9k views
ADD COMMENT
4
Entering edit mode
10.8 years ago
Björn ▴ 670

Hi,

Galaxy is not running everything in python wrappers. Most of the wrappers are bash-like scripts. However, a few of them are, but this is not a speed limitation. All what these wrappers are doing is abstracting the inputs and outputs (tempfiles etc.). In that case the program is usually invoked through subprocess, so there are no speed issues. Btw. deletion of intermediate data can also be handeled by galaxy and you do not need to care about it.

Hope that helps,

Bjoern

ADD COMMENT
0
Entering edit mode

Thanks, that's good to know and saves me a lot of time. I'm glad that this is the case. It makes much more sense! What confused me as well is the "load balancing" documentation that also makes it sound like an issue. They must be talking about for 100+ users at a time.

ADD REPLY
2
Entering edit mode

Correct. The Galaxy application itself is subject to the Python Global Interpreter Lock. You can bypass this by specifying multiple instances. It's a little tricky but definitely doable, and you definitely won't need to do it until you regularly have multiple simultaneous users.

ADD REPLY
3
Entering edit mode
10.8 years ago

The main Galaxy server gets a lot of use. So, I would consider it slow due to the number users.

This is why some institutions set up their own galaxy mirror (where user access can be limited, decreasing the total number of users). If you had a local mirror, you could benchmark NGS tasks and definitely see a difference. I wouldn't consider speed a problem for a mirror installation.

ADD COMMENT
3
Entering edit mode
10.8 years ago
Dan D 7.4k

I've deployed a local installation of Galaxy on a cluster. If you examine the Python wrappers carefully, you'll see that they're constructing and then executing a command line. Thus the tools they're wrapping are not subject to the Python Global Interpreter Lock. Galaxy won't run tools any slower than they would run on a pure command-line execution if you're submitting jobs to a cluster.

Galaxy also includes scripts to automatically delete datasets according to parameters you specify. More information here.

ADD COMMENT
0
Entering edit mode

Thanks! Your link was very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2174 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6