Question: Alternatives To Galaxy For Wrapping Command Line Tools In A Graphical User Interface?
7
gravatar for Samuel Lampa
2.4 years ago by
Samuel Lampa790
Uppsala
Samuel Lampa790 wrote:

The Galaxy Bioinformatics Portal software is becoming increasingly popular as a way to run command line bioinformatics software from the web, as well as defining workflows of chained runs through different tools.

Galaxy has some serious issues though when it comes to running it in a secure way on a HPC cluster with hundreds of users, and letting it access system wide file systems etc.

Hopefully this will change over time, as the core devs realize the wish to run Galaxy on HPC clusters, but in the meanwhile, I was wondering what other similar software there is out there?

The features I'm intereseted are:

  • Ability to wrap command line tools (configure their flags and options in a form)
  • HPC integration - being able to run jobs on a cluster
  • (preferrably) Web interface
ADD COMMENTlink modified 2.4 years ago by Michael Reich10 • written 2.4 years ago by Samuel Lampa790

related: http://biostar.stackexchange.com/questions/4353

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum58k

Oh, I see I forgot to mention the main features I'm looking for ... Added them now.

ADD REPLYlink written 2.4 years ago by Samuel Lampa790
7
gravatar for Nate
2.4 years ago by
Nate100
State College, PA
Nate100 wrote:

Hi Samuel,

Can you post some details regarding your security concerns? I'm the Galaxy developer who wrote and maintains cluster support, and I'm actually very security-conscious. If we're not making it possible to deploy on existing HPC resources due to reasonable policies, then I'd very much like to know what we can do differently. Thanks!

ADD COMMENTlink written 2.4 years ago by Nate100
1

This is not so much a Galaxy problem as it is a problem with any server application which would run on in a multiuser environment. Galaxy would need a way to obtain privileges for certain operations, and then drop those privileges as necessary. This would require a fairly tight integration if your environment requires something like Kerberos. One much simpler solution would be to use groups or extended ACLs on files on the filesystem that would allow Galaxy read access to appropriate data. As you suggest, there are other workarounds - please do post to the list.

ADD REPLYlink written 2.4 years ago by Nate100

Hi, that's great! I have been in the process to summarize the problems we have encountered, so I'll be happy to come back with that asap!

ADD REPLYlink written 2.4 years ago by Samuel Lampa790

Nate, among other things, they refer specifically to this mail back in March:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-March/004738.html

The proof of concept is quite straightforward by uploading an html file via the data upload form. I'm surprised that the tool does not use galaxy.util.sanitize_html :-S

Thanks to Leif Nixon for the heads up.

ADD REPLYlink written 2.4 years ago by Roman Valls Guimerà350

Hey Roman. Dannon has worked on the XSS stuff in the past, we're taking another look to see what still needs to be done. Thanks for bringing it to my attention.

ADD REPLYlink written 2.4 years ago by Nate100

Now fixed on: https://bitbucket.org/galaxy/galaxy-central/changeset/35fee32991ce

Thanks Dannon !

ADD REPLYlink written 2.4 years ago by Roman Valls Guimerà350

Samuel, if this addresses your security concerns, please let us know.

ADD REPLYlink written 2.4 years ago by Nate100

Hi Nate! Our concern is actually more related to the problem that in order to access all users files, on an existing file system, without the need to copy any data, would require the galaxy server process to run either as root, or with suid permissions, both of which makes access to all files on the cluster dependent on the security of the Galaxy installation, which does not seem acceptable to our security staff. I think there are ways around it though (we actually have some ideas), and would be very happy to discuss and give feedback, but probably that is better on the mailing list...

ADD REPLYlink written 2.4 years ago by Samuel Lampa790

Why we really want to avoid data copying, is since we are fighting hard to keep from not overfilling our current 400TB parallel filesystem with NGS data. Adding more data duplication - even temporary - would only make things worse...

ADD REPLYlink written 2.4 years ago by Samuel Lampa790
3
gravatar for Pawel Szczesny
2.4 years ago by
Pawel Szczesny2.6k
Poland
Pawel Szczesny2.6k wrote:

You can find comprehensive list on Wikipedia: http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

Missing software on this list is Knime, for which Pierre and his colleagues have recently recently released set of custom nodes: http://plindenbaum.blogspot.com/2011/10/knime4bio-set-of-custom-nodes-for.html

ADD COMMENTlink written 2.4 years ago by Pawel Szczesny2.6k

Too bad that KNIME's HPC integration is not in the open source version :/

ADD REPLYlink written 2.4 years ago by Samuel Lampa790

OK Pawel, +1 ;-)

ADD REPLYlink written 2.4 years ago by Pierre Lindenbaum58k
3
gravatar for Neilfws
2.4 years ago by
Neilfws41k
Sydney, Australia
Neilfws41k wrote:

There have been lots of attempts to create GUIs to CLI tools over the years. Just a couple of examples:

  • Mobyle is the successor to an earlier project named Pise, which was popular about 10 years ago
  • Numerous interfaces have been developed for EMBOSS

Problem with many of these tools: (1) they're often installed on low-spec servers, so functionality is often restricted as compared to the command line tool and (2) if you don't know what you're doing, a nice GUI really doesn't help that much.

From what I hear of Galaxy (I've yet to explore it in any depth), it's vastly superior to most previous efforts.

ADD COMMENTlink written 2.4 years ago by Neilfws41k

Yes, galaxy is nice, at least from the user experience perspective (which is of course the main point with a GUI). Kind of unfortunate that the architecture is so hard to integrate with the HPC / grid world in a good way (you can run your own server in the cluster, in a secure way, but that quite much removes the benefit from a shared server, where you can share workflows etc).

ADD REPLYlink written 2.4 years ago by Samuel Lampa790
3
gravatar for Pierre Lindenbaum
2.4 years ago by
France
Pierre Lindenbaum58k wrote:

alt text

Biologists use KNIME in our lab. It is able to call an external application: see http://www.knime.com/downloads/extensions

Allows running an external program on the data. NOTE: Running an external executable takes the control out of KNIME's hands. This node may crash or hang KNIME and you risk loosing any unsaved data. There will be no progress, no cancel option, in case of failure processes may be dangling in your system, etc. Highlighting will not work across this node. Colors are lost. Use this node at your own risk!

and taverna seems to be able to call an external application: http://www.mygrid.org.uk/dev/wiki/display/developer/Calling+external+commands+from+Taverna

EDIT: The Life Science Grid was described in "Initial steps towards a production platform for DNA sequence analysis on the grid" by Luyf et al. http://www.biomedcentral.com/1471-2105/11/598 . I saw them using the following GUI to run their jobs:

alt text

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Pierre Lindenbaum58k
3
gravatar for Roman Valls Guimerà
2.4 years ago by
Stockholm
Roman Valls Guimerà350 wrote:

Samuel, I encourage you to read the article that shows up in Biostar headers:

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002202#s3

I actually think that the best way to solve those issues is just getting involved on their mailing lists and/or formulating the defficiencies you spot on the issue tracker:

https://bitbucket.org/galaxy/galaxy-central/issues

Fork, code, experiment yourself, you can even open a wiki page and state your concerns more specifically there... sooner or later you'll have attention, feedback and solutions from the developers.

I'm with @neilfws, aside from those defficiencies (that could be solved sooner or later), Galaxy is currently a superior alternative.

But just for the record, I think that YABI is a quite promising alternative actually:

https://ccg.murdoch.edu.au/yabi/login/?next=/yabi/

ADD COMMENTlink modified 7 months ago by Istvan Albert ♦♦ 39k • written 2.4 years ago by Roman Valls Guimerà350

Thanks Roman! Yeah, actually hope to find some time to do that (communicate our needs to the community, and try out some hacks to see how far we get ourselves). Always good to have a good picture of other solutions first though, first to really make sure you go for the best bet, and also for knowing what other solutions you might borrow ideas, solutions and code from. Will try to have a look on how Yabi does things as well (esp. the command line tool wrapping ...).

ADD REPLYlink written 2.4 years ago by Samuel Lampa790

And yes, YABI seems to be the only real (aspiring) Galaxy competitor so far, if restricting oneself to web frameworks (Without the web restriction, I guess the picture becomes more complex, with powerful frameworks like knime and taverna).

ADD REPLYlink written 2.4 years ago by Samuel Lampa790

And yes, YABI seems to be most promising Galaxy alternative so far, in that it wraps commandline tools in a similar, neat fashion Galaxy, with ability to configure all the flags and stuff for their execution through a gui form (correct me if I'm wrong). Don't think I have really seen this thing done as neat in other packages, though I might be wrong ...

ADD REPLYlink written 2.4 years ago by Samuel Lampa790
3
gravatar for Larry_Parnell
2.4 years ago by
Larry_Parnell15k
Boston, MA USA
Larry_Parnell15k wrote:

Another system of which I was introduced during the writing/editing of the BioStar manuscript was Taverna. Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.

ADD COMMENTlink written 2.4 years ago by Larry_Parnell15k

There are even automatic wrapper scripts available to plug your taverna workflows into a galaxy module (see the MyExperiment website).

ADD REPLYlink written 2.4 years ago by ALchEmiXt1.4k
2
gravatar for Montag451
2.4 years ago by
Montag45120
Montag45120 wrote:

Try Yabi - it was built to meet these requirements...

https://ccg.murdoch.edu.au/yabi/

http://code.google.com/p/yabi/

ADD COMMENTlink written 2.4 years ago by Montag45120
1
gravatar for Michael Reich
2.3 years ago by
Michael Reich10 wrote:

Samuel, the GenePattern environment has a number of features that fulfill the requirements you mention:

  • HTML-based wrapping of command-line tools created in any language
  • Integrated ability to run jobs on LSF or SGE, with hooks for a number of additional compute cluster software platforms
  • Web interface

GenePattern also allows you to create analysis workflows, either step by step or by "reverse engineering" the steps from a result file. The software maintains provenance of input parameters and datasets as well as the version of software used in an analysis step.

GenePattern was originally released in 2004 and in architecture and usage is pretty similar to Galaxy. The chief difference is in the available modules, which focus chiefly on machine learning (e.g. clustering, prediction), microarray analysis, sequence variation, and proteomics - though as I mentioned, new modules can be wrapped easily. All of the functions are also available through a Web services API.

ADD COMMENTlink written 2.3 years ago by Michael Reich10

That's interesting. I have known about GenePattern (though not tested it), but didn't know it was that versatile.

ADD REPLYlink written 2.2 years ago by Samuel Lampa790
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 635 users visited in the last hour