Forum: Reproducible Science Short Example
gravatar for Eric Normandeau
5.7 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

I want to give a 20-30 minute talk on the subject of Reproducible Science in biology. What I think would be thrilling for the audience is that I do a self-contained small experiment live while coding, documenting, and commenting it.

I would like to start from an existing template. I have seen nice examples posted on blogs in the past that featured both code and results. Is there such an example that you would suggest for my talk?

Here are some ideas I have for the talk:

  1. Do analyses that are interesting for biologists
  2. Produce cool figures
  3. Keep it simple for biologists (eg: avoid complicated analyses or packages)
  4. Start from an empty directory
  5. Get the data with wget
  6. Use R (most familiar language for biologists where I work)
  7. Use RStudio to make it more palatable than my green on black terminal :)
  8. Use a bash script (or a VERY simple make script) to run everything
  9. Create all the output dynamically (figures, tables...)
  10. Use GitHub to version control everything, including the data (smallish text file)
  11. Test the whole pipeline on another data set to show how much time this approach can save

Do you have any suggestions for such a talk?

Thanks in advance for your input

ADD COMMENTlink modified 5.7 years ago by David Fredman1.0k • written 5.7 years ago by Eric Normandeau10k
gravatar for Pierre Lindenbaum
5.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

Here is a simple and short illustration using a Makefile: it downloads some DNA, extracts the protein accessions, downloads the protein, transforms it to SVG, convert to PNG and merge everything in a latex document:



ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Pierre Lindenbaum127k

Thank you Pierre. It's definitely a cool example, but from the point of view of a biologist with basic notions of R, I don't think we can pretend this is even remotely simple. I am trying to lure them in, not scare them off ;)

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Eric Normandeau10k
gravatar for David Fredman
5.7 years ago by
David Fredman1.0k
University of Bergen, Norway
David Fredman1.0k wrote:

I would definitely suggest using RStudio with RMarkdown (and knitr) for your reproducible analysis example.
Do data download (within R), compute, explanation, figures and tables all within the same document. Some sources you could borrow from:

Software Carpentry R materials

Karl Broman's course materials on reproducible science (or shorter knitr in a knutshell)

The JHU Coursera data science courses


ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by David Fredman1.0k

I like the idea! Do you have a link to a blog post that would have done something similar to what I am planning?

ADD REPLYlink written 5.7 years ago by Eric Normandeau10k

How about running through a differential expression analysis based on the DESeq2 vignette, with a read count matrix as input?

To showcase reuseability, re-run the analysis a second time using a different count matrix and design matrix but identical analysis/plotting code.

Read count matrices for various datasets are either built into R packages (e.g. Pasilla), or you could download them from e.g. the recount project

Edit: Actually, there's ongoing work in the Software Carpentry project to create a "capstone" showcase just like this to round off the novice lessons. See here for the Rmd and data (and, when more finished, the Software Carpentry main bc repo under novice/r/capstones).

ADD REPLYlink modified 5 months ago by RamRS26k • written 5.7 years ago by David Fredman1.0k
gravatar for donfreed
5.7 years ago by
Mountain View, CA
donfreed1.5k wrote:

One example might be analysis to find the coverage over a gene or a transcript with RNA-seq data.

The analysis and the question are relatively simple yet slightly differing methods may easily lead to drastically different results.

ADD COMMENTlink written 5.7 years ago by donfreed1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1744 users visited in the last hour