Question

How to write effective and stable bioinformatics pipeline in R ?

1

Entering edit mode

9.8 years ago

jack ▴ 980

Hi all,

I'm planning to write a bioinformatics pipeline in R. Basicallly it will do gene expression quantification(which is in C++) and DE gene analysis (R package) and at the end, gene ontology and GSEA.

I'm looking for good tips and recommendation to take into account when I'm developing my pipeline in R.

what should I aviod in R ? what should I mostly care about it?

I'm keen to get some recommendations from you.

RNA-Seq pipeline software-error R • 6.1k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by jack ▴ 980

3

Entering edit mode

I appreciate that you are trying to get some general advice before setting out on a task, but this is a very general question. You will probably get more help if you can provide some specifics about what you plan to do (what task are you automating, how do you plan to achieve each step).

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by David W 4.9k

Ram · Answer 1 · 2014-10-02

11

Entering edit mode

9.8 years ago

Pierre Lindenbaum 163k

I'm planning to write a bioinformatics pipeline in R. (...)

what should I avoid in R ?

don't reinvent the wheel: make or other tools like snakemake are the workflow managers you need: How To Organize A Pipeline Of Small Scripts Together?

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.8 years ago by Pierre Lindenbaum 163k

3

Entering edit mode

Yes, much as I love R, if the OP means pipeline in the normal sense of an automated chain of scripts and calls to executables then "what to avoid in R" is probably "all of it".

ADD REPLY • link 9.8 years ago by David W 4.9k

0

Entering edit mode

Can you put link for "what to avoid in R" ? I couldn't find it by googleing

ADD REPLY • link 9.8 years ago by jack ▴ 980

3

Entering edit mode

?

The point all of us are trying to make is that R is typically a bad choice for a pipeline, at least if you're using that term in the same way we do.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

score 11 · Answer 2 · 2014-10-02

11

Entering edit mode

9.8 years ago

Devon Ryan 104k

It usually makes more sense to incorporate an R script into a pipeline rather than writing the pipeline itself in R.

ADD COMMENT • link 9.8 years ago by Devon Ryan 104k

Ram · Answer 3 · 2014-10-02

The best R pipeline example I have seen is the Nature Protocols paper -

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

If you don't have access to the journal, here is pre-publication version.

R and Bioconductor related lilbaries are captured in command

> sessionInfo()

The versions of system software packages are captured:
> system("bowtie2 --version | grep align", intern=TRUE)

[1] "/usr/local/software/bowtie2-2.1.0/bowtie2-align version 2.1.0"

It is a deep learning curve to learn R well enough to write the whole bioinformatics pipepine in R. Good luck :)

Ram · Answer 4 · 2015-10-17

I do a lot of scripting in R, and with ggplot2 and bioconductor; there is so much one can achieve. Pipelining was surely a issue, so we built a tool to do just that (http://docs.flowr.space). One starts with a bunch of system commands, wraps them into a tab-delim text file. When done flowr can submit to a local server (parallel using mclapply), and clusters like LSF, Torque, SLURM and MOAB etc...

Usage:
flowr function [arguments]
   status Detailed status of a flow(s).
   rerun rerun a previously failed flow kill
   Kill the flow, upon providing working directory
   fetch_pipes Checking what modules and pipelines are available;

Please use 'flowr -h function' to obtain further information about the usage of a specific function.

Certainly biased (being a developer) but one may find it much easier to create a tsv file, than learning new syntax.

Second issue I faced was, say I have a R function which does a lot of things and now I wanted to call it from the terminal. R does not have a nice standard argument parse like python/perl. Now we have a package funr, where the first argument is the function you want to call, and rest are its arguments. One can call any R function of any installed package (or sourced script).

funr rnorm n=10
    -1.244571 1.378112 0.02189023 -0.3723951 0.282709 -0.22854 -0.8476185 0.3222024 0.08937781 -0.4985827

Hope you find it useful. Would be curious if it works out.