Transform A Pipeline Of R Scripts Into A Web Application
7
20
Entering edit mode
11.3 years ago

I have a pipeline of scripts that together produce different plots to describe some properties of genes involved in the same pathway.

For the moment, I execute them locally through a Makefile. The only thing that I have to do is to fill a list of genes in a file, and then call the main function, and it will automatically generate some graphs and tables on them.

I wonder, how can I translate this pipeline in a web application? I already have the scripts, I just need a way to create a web page where an user can upload a list of files and then download the results.

I have some experience with django and the earlier plone, but so much time has passed and I forgot how to use them... how would you implement it? By the way, is there any special framework for bioinformatics-related stuff? Or are there any special rules or standards that I should follow in order to integrate a web application with other bioinformatics-related services?

webservice web pipeline galaxy • 10k views
19
Entering edit mode
11.3 years ago

Galaxy is excellent for this type of script integration and workflow development:

http://galaxy.psu.edu/

It's written in Python and easy to get running:

http://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy

You add in your custom scripts with a simple XML based language:

0
Entering edit mode

that's how I managed all of my various tools. Now every script becomes a galaxy tool from the start. This makes it easier to join projects and share data. I also makes it easier to work with collaborators because I can just point them to a saved history on my own instance. I can also check the logs to see if they actually looked at it before complianing ;)

0
Entering edit mode

Thank you very much: in fact, I was looking for something in the style of galaxy. Let's see what other options come up with this thread.

9
Entering edit mode
11.3 years ago

This is very much possible. You need to write a wrapper script around the R scripts to run the pipeline and pass the results / plots to your HTML.

Read up the gene names from the web page to server-side using CGI

Write the names to a file in a tempdir of webserver

Write your R commands as a tempRscript.R

Run the command R--no-save < tempdir/tempRscript.R using (backticks) or system inside your server-side script

This will generate the results and the plot

Once this files are ready you may print to HTML using your server-side script.

You may also check RSPerl, Rcgi, RSPython, RPy or RSOAP. Not sure if they are active projects or not. As suggested by Pierre: using Pise/Mobyle will be an alternate option. I have tried to use Pise back in 2005, but the experience was not so smooth and finally I used  and System commands to implement a webserver based on a C code.

6
Entering edit mode

rpy2 is definitely alive (and kicking). I have made few instant web applications using Python frameworks with it.

3
Entering edit mode

Thanks for also mentioning RSPerl. I have made a patched version of RSPerl which is available here: http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/HowToInstallRSPerl. That will also install with the most recent R version (>2.11). I would not recommend the 'official' version though.

1
Entering edit mode

@Khader: which is precisely a known security issue.

See point 3 at Q6: I'm developing custom CGI scripts. What unsafe practices should I avoid? http://www.w3.org/Security/Faq/wwwsf4.html

0
Entering edit mode

Excellent details Michael. Thanks for sharing this.

0
Entering edit mode

thanks for the answer, however I would like to avoid calling system commands through the backticks, since it can cause very serious security problems when passing parameters to the script, and I am not expert enough of web security to know how to sanitize the inputs by myself. Moreover, I would have to design the web interface by myself, and I would prefer something in the style of plone or other cms, where the aspect of the web can be customized easily.

0
Entering edit mode

AFAIK, using back tick is equivalent of using the system function of a language. If you are concerned about security issues you may use the Perl system function (system("command arg1 arg2 arg3");) equivalent to run your R scripts.

0
Entering edit mode

Agreed. That's a classic example of how bad backticks can go. But there is always ways to get around with it. Also, need to take an extra bit of care when using sendmail. For example, you can have an email id checking function to look suspicious symbols like: < or /.

7
Entering edit mode
11.3 years ago

... we have developed a Web interface generator for more than 150 molecular biology command-line driven programs(...). The generator uses XML as a high-level description language of the legacy software parameters. Its aim is to provide users with the equivalent of a basic Unix environment, with program combination, customization and basic scripting through macro registration

...and/or Mobyle https://projets.pasteur.fr/wiki/mobyle

1
Entering edit mode

It has been replaced by Mobyle as far as I know

0
Entering edit mode

How does it deal with the input? For example, is it easy to hook up with a chemical editor, or something as simple as a text field for entering a sequence?

0
Entering edit mode

All I know is that the numerous interfaces (http://bioweb.pasteur.fr/intro-en.html) for the Pasteur Institute have been generated using PISE. Those interfaces have now moved to Mobyle.

0
Entering edit mode

for example, using the internet archive, here is an old interface designed using PISE: http://web.archive.org/web/20030304132911/bioweb.pasteur.fr/seqanal/interfaces/msbar.html

0
Entering edit mode

Thank you very much: however, the PISE publication is very old, dated 2001, and this is before a lot of very interesting web technologies were developed. Do you know if this tool is still under development and if it has been modernized with newer technologies?

7
Entering edit mode
11.3 years ago

"Web application" covers a wide range of possibilities depending on the requirements. Your pipeline seems rather straightforward:

For the moment, I execute them locally through a Makefile. The only thing that I have to do is to fill a list of genes in a file, and then call the main function, and it will automatically generate some graphs and tables on them.

A minimal web framework of your choosing (may be handling sessions), with simple form to upload the list of genes, would already be ok. As you mention Django and Plone, I suppose that you are comfortable with Python. I have used bottle for prototyping, and get something up and running in no time.

If most of your code is in R + Python, rpy2 is definitely an option to consider. Setting up a minimal web applicattion can be done in an afternoon. I have slides around that theme presented at BOSC 2010.

0
Entering edit mode

Thank you very much. I know that setting up a web application is easy, but I am concerned with the security problems that can show up, and I also want it to be easily integrated with other bioinformatics tools and with specifications if there are. In any case, I will look at bottle, thanks!

0
Entering edit mode

What do you mean by "security"; robustness against piracy and other unwanted activities, or restrict access to content ?

My advice would be: look at your real requirements, and pick the lowest energy solution answer them. The simpler is what you have, the easier it is to control the "security". World domination plans can always come later.

6
Entering edit mode
11.3 years ago
Neilfws 49k

You might want to look at RApache - "a project supporting web application development using the R statistical language and environment and the Apache web server".

It's possible to write web applications in pure R, using RApache and the brew package - see the documentation for examples.

It's also possible to integrate RApache code with other web frameworks. I wrote a basic introduction about communication between RApache and a Rails application.

2
Entering edit mode
11.3 years ago
Cornel ▴ 50

I've used a job queuing system gearman.org) for running all sorts of "intensive" bioinformatics tool for this project: dnasubway.org

0
Entering edit mode
6.8 years ago
YOT ▴ 30

I have worked in a project like that. I had a pipeline in bash. That should 4 python functions and at the end, the pipeline should call a Rscript. At the very end, R should create a tree of organisms and display a table whit some more informations.

I was invited to translate to a web app -> friendly user.

So the user only need to upload the file he wants analyse.

1-jQuery to call your php script that will retirve file.

2- After upload is done -> call your first function.

3-Do the same untill your pipeline is fineshed.

4-You dont need to have Rapache. You can call your R instaled outside you server.

5-Set the Rscript to save the final result in a file.

6-Read this file using php,

Retrive the information and send to javascript.

Now you can do what you want. You can use d3 or canvas or SVG to create what you want.

I have use AJAX and returned JSON. Already ready to be used at d3.

Note: maybe you will need to use putenv to make R or your languages functions, to be recognized by php and use shell_exec to call eath function like this Qiime - pynest, call from shell_exec() in PHP not working

NOte2 to R. There are 2 options. You could call R script and pass values that comes from your pipeline process.

Or, as I did. I set R  to read the file that was created at the end of pipeline. Same result, but less issue.