Forum: Reproducible Science Short Example
6
gravatar for Eric Normandeau
5.1 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

I want to give a 20-30 minute talk on the subject of Reproducible Science in biology. What I think would be thrilling for the audience is that I do a self-contained small experiment live while coding, documenting, and commenting it.

I would like to start from an existing template. I have seen nice examples posted on blogs in the past that featured both code and results. Is there such an example that you would suggest for my talk?

Here are some ideas I have for the talk:

  1. Do analyses that are interesting for biologists
  2. Produce cool figures
  3. Keep it simple for biologists (eg: avoid complicated analyses or packages)
  4. Start from an empty directory
  5. Get the data with wget
  6. Use R (most familiar language for biologists where I work)
  7. Use RStudio to make it more palatable than my green on black terminal :)
  8. Use a bash script (or a VERY simple make script) to run everything
  9. Create all the output dynamically (figures, tables...)
  10. Use GitHub to version control everything, including the data (smallish text file)
  11. Test the whole pipeline on another data set to show how much time this approach can save

Do you have any suggestions for such a talk?

Thanks in advance for your input

ADD COMMENTlink modified 5.1 years ago by David Fredman990 • written 5.1 years ago by Eric Normandeau10k
3
gravatar for Pierre Lindenbaum
5.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

Here is a simple and short illustration using a Makefile: it downloads some DNA, extracts the protein accessions, downloads the protein, transforms it to SVG, convert to PNG and merge everything in a latex document:

 

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Pierre Lindenbaum122k
1

Thank you Pierre. It's definitely a cool example, but from the point of view of a biologist with basic notions of R, I don't think we can pretend this is even remotely simple. I am trying to lure them in, not scare them off ;)

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Eric Normandeau10k
3
gravatar for David Fredman
5.1 years ago by
David Fredman990
University of Bergen, Norway
David Fredman990 wrote:

I would definitely suggest using RStudio with RMarkdown (and knitr) for your reproducible analysis example.
Do data download (within R), compute, explanation, figures and tables all within the same document. Some sources you could borrow from:

Software Carpentry R materials

Karl Broman's course materials on reproducible science (or shorter knitr in a knutshell)

The JHU Coursera data science courses
 

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by David Fredman990

I like the idea! Do you have a link to a blog post that would have done something similar to what I am planning?

ADD REPLYlink written 5.1 years ago by Eric Normandeau10k

How about running through a differential expression analysis based on the DESeq2 vignette, with a read count matrix as input?

http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf

To showcase reuseability, re-run the analysis a second time using a different count matrix and design matrix but identical analysis/plotting code.

Read count matrices for various datasets are either built into R packages (e.g. Pasilla), or you could download them from e.g. the recount project

http://bowtie-bio.sourceforge.net/recount/

Edit: Actually, there's ongoing work in the Software Carpentry project to create a "capstone" showcase just like this to round off the novice lessons. See here for the Rmd and data (and, when more finished, the Software Carpentry main bc repo under novice/r/capstones).

 

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by David Fredman990
1
gravatar for donfreed
5.1 years ago by
donfreed1.4k
Mountain View, CA
donfreed1.4k wrote:

One example might be analysis to find the coverage over a gene or a transcript with RNA-seq data.

The analysis and the question are relatively simple yet slightly differing methods may easily lead to drastically different results.

ADD COMMENTlink written 5.1 years ago by donfreed1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1625 users visited in the last hour