The Galaxy Bioinformatics Portal software is becoming increasingly popular as a way to run command line bioinformatics software from the web, as well as defining workflows of chained runs through different tools.
Galaxy has some serious issues though when it comes to running it in a secure way on a HPC cluster with hundreds of users, and letting it access system wide file systems etc.
Hopefully this will change over time, as the core devs realize the wish to run Galaxy on HPC clusters, but in the meanwhile, I was wondering what other similar software there is out there?
The features I'm intereseted are:
Samuel, I encourage you to read the article that shows up in Biostar headers:
I actually think that the best way to solve those issues is just getting involved on their mailing lists and/or formulating the defficiencies you spot on the issue tracker:
Fork, code, experiment yourself, you can even open a wiki page and state your concerns more specifically there... sooner or later you'll have attention, feedback and solutions from the developers.
I'm with @neilfws, aside from those defficiencies (that could be solved sooner or later), Galaxy is currently a superior alternative.
But just for the record, I think that YABI is a quite promising alternative actually:
Can you post some details regarding your security concerns? I'm the Galaxy developer who wrote and maintains cluster support, and I'm actually very security-conscious. If we're not making it possible to deploy on existing HPC resources due to reasonable policies, then I'd very much like to know what we can do differently. Thanks!
You can find comprehensive list on Wikipedia: http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems
Missing software on this list is Knime, for which Pierre and his colleagues have recently recently released set of custom nodes: http://plindenbaum.blogspot.com/2011/10/knime4bio-set-of-custom-nodes-for.html
There have been lots of attempts to create GUIs to CLI tools over the years. Just a couple of examples:
Problem with many of these tools: (1) they're often installed on low-spec servers, so functionality is often restricted as compared to the command line tool and (2) if you don't know what you're doing, a nice GUI really doesn't help that much.
From what I hear of Galaxy (I've yet to explore it in any depth), it's vastly superior to most previous efforts.
Allows running an external program on the data. NOTE: Running an external executable takes the control out of KNIME's hands. This node may crash or hang KNIME and you risk loosing any unsaved data. There will be no progress, no cancel option, in case of failure processes may be dangling in your system, etc. Highlighting will not work across this node. Colors are lost. Use this node at your own risk!
and taverna seems to be able to call an external application: http://www.mygrid.org.uk/dev/wiki/display/developer/Calling+external+commands+from+Taverna
EDIT: The Life Science Grid was described in "Initial steps towards a production platform for DNA sequence analysis on the grid" by Luyf et al. http://www.biomedcentral.com/1471-2105/11/598 . I saw them using the following GUI to run their jobs:
Another system of which I was introduced during the writing/editing of the BioStar manuscript was Taverna. Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
Samuel, the GenePattern environment has a number of features that fulfill the requirements you mention:
GenePattern also allows you to create analysis workflows, either step by step or by "reverse engineering" the steps from a result file. The software maintains provenance of input parameters and datasets as well as the version of software used in an analysis step.
GenePattern was originally released in 2004 and in architecture and usage is pretty similar to Galaxy. The chief difference is in the available modules, which focus chiefly on machine learning (e.g. clustering, prediction), microarray analysis, sequence variation, and proteomics - though as I mentioned, new modules can be wrapped easily. All of the functions are also available through a Web services API.