Question: Integrating A Local Galaxy Instance With Cluster Or Cloud Compute Resources?
8
gravatar for Adam Cornwell
6.2 years ago by
Adam Cornwell410
United States
Adam Cornwell410 wrote:

I've been looking at putting a local Galaxy instance together so that we have a platform accessible to lab members for viewing data when our NGS samples start rolling in. I've got a single server for this right now, which obviously would be a major bottleneck for trying to do an end-to-end analysis of a larger dataset. We do have cluster computing facilities available as a core resource however- so I started thinking about the possibility of having the option of using a local Galaxy instance to initiate compute tasks on a remote server, either in our computing center or in the cloud. This isn't something available in the core Galaxy codebase at the moment as far as I know, but seems like something that could exist somewhere.

To clarify a bit- I can't actually host Galaxy on the cluster, and hosting it full-time in the cloud would probably be too much money. Our existing server system might be powerful enough for most tasks. When there's something that would take a month to run on that box, it would be great to be able to use the same front-end instance to kick off processing on a remote system- like to spin up EC2 nodes and handle the sending/receiving of data,

The main motivation would be the ability to have a local system maintaining our own sample database and workflow management, while being able to leverage larger computing systems for bigger jobs.

I don't really expect something like this to exist already for Galaxy and the like, but it seemed like something worth asking. You never know what resources are around that Google somehow missed.

ngs galaxy cloud • 4.0k views
ADD COMMENTlink modified 4.3 years ago by loic.bourg0 • written 6.2 years ago by Adam Cornwell410

I know this topic is pretty old, but I am looking forward to set up the same kind of galaxy installation. Is there anything new about it?
 

ADD REPLYlink written 4.3 years ago by loic.bourg0

I moved your "answer" to a comment on the above post.  This is not an answer to the above question.

I don't quite understand your question -- Galaxy is a constantly evolving project with many people contributing to it, so of course there is plenty new in the latest version as opposed to two years ago.  Here's information on how to install Galaxy -- let us know if you have any specific questions, but keep in mind that computer installation issues are often user specific and not within the bounds of this bioinformatics question and answer forum.

ADD REPLYlink written 4.2 years ago by Josh Herr5.6k
4
gravatar for Istvan Albert
6.2 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

While the default configuration runs jobs in different processes on the same computer the Galaxy software has builtin support to run jobs via a cluster. That is how the main Galaxy instance is actually configured:

As you can see below there is support for a number of job schedulers such as PBS or Condor:

https://bitbucket.org/galaxy/galaxy-dist/src/9fd7fe0c5712/lib/galaxy/jobs/runners?at=default

To set it up that way you probably need more detailed advice from their support channels.

ADD COMMENTlink written 6.2 years ago by Istvan Albert ♦♦ 80k
3
gravatar for jmchilton
6.2 years ago by
jmchilton30
United States
jmchilton30 wrote:

You want a local Galaxy instance which can spin up and utilize cloud compute nodes as needed? This is not available out of the box, and I think this would require significant effort to accomplish. You would want to either hack up CloudMan* or hack up something around the LWR* depending on whether you wanted to export your local file systems to the cloud or stage files their on a per job basis.

The CloudMan solution would involve running CloudMan in a master mode on your local system and then building an Amazon image tailored to your setup for running CloudMan worker nodes on Amazon. It would require significant tweaking just to run CloudMan on your local setup, and then you would need to hack it to modify your file exports (and maybe firewall) as new instances are created. I have not really ever researched exposing file mounts to ephemeral instances on EC2, this may not be very performant.

The LWR solution would involve setting up a cloud image with the LWR server installed (if you really are interested I can bake this into CloudBioLinux for you, it is on my long term TODO list anyway), and then creating some sort of management script or console that would spin up cloud instances with the LWR installed on them and store the address to them somewhere. You can then use Galaxy's dynamic job runners* to send jobs into the cloud when LWR instances are available. Dealing with things like genome indices in this case would require some work, but I hope to make this process easier this year.

Disclosure: I implemented the LWR and dynamic job runners and I am a regular contributor to CloudMan.

ADD COMMENTlink written 6.2 years ago by jmchilton30
1
gravatar for Alex Paciorkowski
6.2 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

Adam, this is definitely do-able -- in fact there's an upcoming workshop at Bio-IT world on just this topic here (and, ahem...yes...full disclosure I'm one of the presenters...) Our excellent collaborators at University of Chicago's CI are implementing just this kind of system. If you like we can talk in more detail about how to do this.

ADD COMMENTlink written 6.2 years ago by Alex Paciorkowski3.3k
0
gravatar for Josh Herr
6.2 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

I guess I'm not totally clear on what you are asking about; if you want to know how to set Galaxy up on your own server or the possibility of moving it from your own server to a cluster in the future. Does this link Get Galaxy: Galaxy Download and Installation help to answer your question?

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Josh Herr5.6k
1

I'm pretty sure what I'm asking about isn't supported in the main Galaxy package, but it seems like something that could feasibly have already been developed by a third party. Basically, does there exist anything to allow a local Galaxy instance to act as a front-end for kicking off analysis on cluster or cloud compute resources. Since the cluster is a shared resource, and no one wants to pay for hosting Galaxy from the cloud full-time, I'm looking for a compromise solution. (will edit for clarification)

ADD REPLYlink written 6.2 years ago by Adam Cornwell410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1636 users visited in the last hour