Alternatives To Galaxy For Wrapping Command Line Tools In A Graphical User Interface?
9
19
Entering edit mode
12.9 years ago
Samuel Lampa ★ 1.3k

The Galaxy Bioinformatics Portal software is becoming increasingly popular as a way to run command line bioinformatics software from the web, as well as defining workflows of chained runs through different tools.

Galaxy has some serious issues though when it comes to running it in a secure way on a HPC cluster with hundreds of users, and letting it access system wide file systems etc.

Hopefully this will change over time, as the core devs realize the wish to run Galaxy on HPC clusters, but in the meanwhile, I was wondering what other similar software there is out there?

The features I'm intereseted are:

  • Ability to wrap command line tools (configure their flags and options in a form)
  • HPC integration - being able to run jobs on a cluster
  • (preferrably) Web interface
galaxy command-line • 18k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Oh, I see I forgot to mention the main features I'm looking for ... Added them now.

ADD REPLY
7
Entering edit mode
12.9 years ago
Nate ▴ 100

Hi Samuel,

Can you post some details regarding your security concerns? I'm the Galaxy developer who wrote and maintains cluster support, and I'm actually very security-conscious. If we're not making it possible to deploy on existing HPC resources due to reasonable policies, then I'd very much like to know what we can do differently. Thanks!

ADD COMMENT
1
Entering edit mode

This is not so much a Galaxy problem as it is a problem with any server application which would run on in a multiuser environment. Galaxy would need a way to obtain privileges for certain operations, and then drop those privileges as necessary. This would require a fairly tight integration if your environment requires something like Kerberos. One much simpler solution would be to use groups or extended ACLs on files on the filesystem that would allow Galaxy read access to appropriate data. As you suggest, there are other workarounds - please do post to the list.

ADD REPLY
0
Entering edit mode

Hi, that's great! I have been in the process to summarize the problems we have encountered, so I'll be happy to come back with that asap!

ADD REPLY
0
Entering edit mode

Nate, among other things, they refer specifically to this mail back in March:

http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-March/004738.html

The proof of concept is quite straightforward by uploading an html file via the data upload form. I'm surprised that the tool does not use galaxy.util.sanitize_html :-S

Thanks to Leif Nixon for the heads up.

ADD REPLY
0
Entering edit mode

Hey Roman. Dannon has worked on the XSS stuff in the past, we're taking another look to see what still needs to be done. Thanks for bringing it to my attention.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Samuel, if this addresses your security concerns, please let us know.

ADD REPLY
0
Entering edit mode

Hi Nate! Our concern is actually more related to the problem that in order to access all users files, on an existing file system, without the need to copy any data, would require the galaxy server process to run either as root, or with suid permissions, both of which makes access to all files on the cluster dependent on the security of the Galaxy installation, which does not seem acceptable to our security staff. I think there are ways around it though (we actually have some ideas), and would be very happy to discuss and give feedback, but probably that is better on the mailing list...

ADD REPLY
0
Entering edit mode

Why we really want to avoid data copying, is since we are fighting hard to keep from not overfilling our current 400TB parallel filesystem with NGS data. Adding more data duplication - even temporary - would only make things worse...

ADD REPLY
4
Entering edit mode
12.9 years ago

You can find comprehensive list on Wikipedia: http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

Missing software on this list is Knime, for which Pierre and his colleagues have recently recently released set of custom nodes: http://plindenbaum.blogspot.com/2011/10/knime4bio-set-of-custom-nodes-for.html

ADD COMMENT
0
Entering edit mode

Too bad that KNIME's HPC integration is not in the open source version :/

ADD REPLY
0
Entering edit mode

OK Pawel, +1 ;-)

ADD REPLY
3
Entering edit mode
12.9 years ago
Neilfws 49k

There have been lots of attempts to create GUIs to CLI tools over the years. Just a couple of examples:

  • Mobyle is the successor to an earlier project named Pise, which was popular about 10 years ago
  • Numerous interfaces have been developed for EMBOSS

Problem with many of these tools: (1) they're often installed on low-spec servers, so functionality is often restricted as compared to the command line tool and (2) if you don't know what you're doing, a nice GUI really doesn't help that much.

From what I hear of Galaxy (I've yet to explore it in any depth), it's vastly superior to most previous efforts.

ADD COMMENT
0
Entering edit mode

Yes, galaxy is nice, at least from the user experience perspective (which is of course the main point with a GUI). Kind of unfortunate that the architecture is so hard to integrate with the HPC / grid world in a good way (you can run your own server in the cluster, in a secure way, but that quite much removes the benefit from a shared server, where you can share workflows etc).

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
12.9 years ago

alt text

Biologists use KNIME in our lab. It is able to call an external application: see http://www.knime.com/downloads/extensions

Allows running an external program on the data. NOTE: Running an external executable takes the control out of KNIME's hands. This node may crash or hang KNIME and you risk loosing any unsaved data. There will be no progress, no cancel option, in case of failure processes may be dangling in your system, etc. Highlighting will not work across this node. Colors are lost. Use this node at your own risk!

and taverna seems to be able to call an external application: http://www.mygrid.org.uk/dev/wiki/display/developer/Calling+external+commands+from+Taverna

EDIT: The Life Science Grid was described in "Initial steps towards a production platform for DNA sequence analysis on the grid" by Luyf et al. http://www.biomedcentral.com/1471-2105/11/598. I saw them using the following GUI to run their jobs:

alt text

ADD COMMENT
3
Entering edit mode
12.9 years ago

Samuel, I encourage you to read the article that shows up in Biostar headers:

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002202#s3

I actually think that the best way to solve those issues is just getting involved on their mailing lists and/or formulating the defficiencies you spot on the issue tracker:

https://bitbucket.org/galaxy/galaxy-central/issues

Fork, code, experiment yourself, you can even open a wiki page and state your concerns more specifically there... sooner or later you'll have attention, feedback and solutions from the developers.

I'm with @neilfws, aside from those defficiencies (that could be solved sooner or later), Galaxy is currently a superior alternative.

But just for the record, I think that YABI is a quite promising alternative actually:

https://ccg.murdoch.edu.au/yabi/login/?next=/yabi/

ADD COMMENT
0
Entering edit mode

Thanks Roman! Yeah, actually hope to find some time to do that (communicate our needs to the community, and try out some hacks to see how far we get ourselves). Always good to have a good picture of other solutions first though, first to really make sure you go for the best bet, and also for knowing what other solutions you might borrow ideas, solutions and code from. Will try to have a look on how Yabi does things as well (esp. the command line tool wrapping ...).

ADD REPLY
0
Entering edit mode

And yes, YABI seems to be the only real (aspiring) Galaxy competitor so far, if restricting oneself to web frameworks (Without the web restriction, I guess the picture becomes more complex, with powerful frameworks like knime and taverna).

ADD REPLY
0
Entering edit mode

And yes, YABI seems to be most promising Galaxy alternative so far, in that it wraps commandline tools in a similar, neat fashion Galaxy, with ability to configure all the flags and stuff for their execution through a gui form (correct me if I'm wrong). Don't think I have really seen this thing done as neat in other packages, though I might be wrong ...

ADD REPLY
3
Entering edit mode
12.9 years ago

Another system of which I was introduced during the writing/editing of the BioStar manuscript was Taverna. Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.

ADD COMMENT
0
Entering edit mode

There are even automatic wrapper scripts available to plug your taverna workflows into a galaxy module (see the MyExperiment website).

ADD REPLY
2
Entering edit mode
12.9 years ago
Montag451 ▴ 20

Try Yabi - it was built to meet these requirements...

https://ccg.murdoch.edu.au/yabi/

http://code.google.com/p/yabi/

ADD COMMENT
1
Entering edit mode
12.9 years ago

Samuel, the GenePattern environment has a number of features that fulfill the requirements you mention:

  • HTML-based wrapping of command-line tools created in any language
  • Integrated ability to run jobs on LSF or SGE, with hooks for a number of additional compute cluster software platforms
  • Web interface

GenePattern also allows you to create analysis workflows, either step by step or by "reverse engineering" the steps from a result file. The software maintains provenance of input parameters and datasets as well as the version of software used in an analysis step.

GenePattern was originally released in 2004 and in architecture and usage is pretty similar to Galaxy. The chief difference is in the available modules, which focus chiefly on machine learning (e.g. clustering, prediction), microarray analysis, sequence variation, and proteomics - though as I mentioned, new modules can be wrapped easily. All of the functions are also available through a Web services API.

ADD COMMENT
0
Entering edit mode

That's interesting. I have known about GenePattern (though not tested it), but didn't know it was that versatile.

ADD REPLY
1
Entering edit mode
9.6 years ago

Bit late to this party but I've recently developed Wooey a simple tool for creating "Web UIs" for command line scripts. It's based on the similarly named Gooey which automatically creates standard desktop UIs from Python command line scripts. For Python scripts Wooey inherits that ability and allows you create a web-based UI from a command line script with a single command.

The interface presents the script (+documentation) to the user, takes form input, stores a 'job' in a database. A separate rudimentary scheduler runs the script and outputs the result back to the user with some visualisation of outputs. The scheduling could feasibly be replaced by any alternative system - all it requires is access to the database and the file storage for the input/output of files.

It's nowhere near as powerful, complete, or -- insert adjective here -- as Galaxy... but it is definitely simple.

ADD COMMENT

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6