Question: How to write effective and stable bioinformatics pipeline in R ?
1
gravatar for jack
4.6 years ago by
jack750
Germany
jack750 wrote:

Hi all,

 

I'm planning to write a bioinformatics pipeline in R. Basicallly it will do gene expression quantification(which is in C++) and DE gene analysis (R package) and at the end, gene ontology and GSEA.

I'm looking for good tips and recommendation to take into account when I'm developing my pipeline in R.

what should I aviod in R ? what should I mostly care about it?

I'm keen to get some recommendations from you.

ADD COMMENTlink modified 3.5 years ago by sahiilseth30 • written 4.6 years ago by jack750
3

I appreciate that you are trying to get some general advice before setting out on a task, but this is a very general question. You will probably get more help if you can provide some specifics about what you plan to do (what task are you automating, how do you plan to achieve each step). 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by David W4.7k
11
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

> I'm planning to write a bioinformatics pipeline in R. (...)

> what should I avoid in R ?

 

don't reinvent the wheel: make or other tools like snakemake are the workflow managers you need: How To Organize A Pipeline Of Small Scripts Together?

 

ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum119k
3

Yes, much as I love R, if the OP means pipeline in the normal sense of an automated chain of scripts and calls to executables then "what to avoid in R" is probably "all of it".

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by David W4.7k

Can you put link for "what to avoid in R" ? I couldn't find it by googleing

ADD REPLYlink written 4.6 years ago by jack750
3

?

The point all of us are trying to make is that R is typically a bad choice for a pipeline, at least if you're using that term in the same way we do.

ADD REPLYlink written 4.6 years ago by Devon Ryan89k
11
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

It usually makes more sense to incorporate an R script into a pipeline rather than writing the pipeline itself in R.

ADD COMMENTlink written 4.6 years ago by Devon Ryan89k
0
gravatar for rtliu
4.6 years ago by
rtliu2.0k
New Zealand
rtliu2.0k wrote:

The best R pipeline example I have seen is the Nature Protocols paper -
Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

If you don't have access to the journal, here is pre-publication version.

 

R and Bioconductor related lilbaries are captured in command

> sessionInfo()

The versions of system software packages are captured:
> system("bowtie2 --version | grep align", intern=TRUE)

[1] "/usr/local/software/bowtie2-2.1.0/bowtie2-align version 2.1.0"

 

It is a deep learning curve to learn R well enough to write the whole bioinformatics pipepine in R. Good luck :)

ADD COMMENTlink written 4.6 years ago by rtliu2.0k
0
gravatar for sahiilseth
3.5 years ago by
sahiilseth30
United States
sahiilseth30 wrote:

I do a lot of scripting in R, and with ggplot2 and bioconductor; there is so much one can achieve. Pipelining was surely a issue, so we built a tool to do just that (http://docs.flowr.space). One starts with a bunch of system commands, wraps them into a tab-delim text file. When done flowr can submit to a local server (parallel using mclapply), and clusters like LSF, Torque, SLURM and MOAB etc...

Usage:
flowr function [arguments]
   status Detailed status of a flow(s).
   rerun rerun a previously failed flow kill
   Kill the flow, upon providing working directory
   fetch_pipes Checking what modules and pipelines are available;

Please use 'flowr -h function' to obtain further information about the usage of a specific function.

 

Certainly biased (being a developer) but one may find it much easier to create a tsv file, than learning new syntax.

Second issue I faced was, say I have a R function which does a lot of things and now I wanted to call it from the terminal. R does not have a nice standard argument parse like python/perl. Now we have a package funr, where the first argument is the function you want to call, and rest are its arguments. One can call any R function of any installed package (or sourced script).

funr rnorm n=10

    -1.244571 1.378112 0.02189023 -0.3723951 0.282709 -0.22854 -0.8476185 0.3222024 0.08937781 -0.4985827

Hope you find it useful. Would be curious if it works out.

 

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by sahiilseth30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour