Forum:Reproducible Science Short Example
3
6
Entering edit mode
7.1 years ago

I want to give a 20-30 minute talk on the subject of Reproducible Science in biology. What I think would be thrilling for the audience is that I do a self-contained small experiment live while coding, documenting, and commenting it.

I would like to start from an existing template. I have seen nice examples posted on blogs in the past that featured both code and results. Is there such an example that you would suggest for my talk?

Here are some ideas I have for the talk:

  1. Do analyses that are interesting for biologists
  2. Produce cool figures
  3. Keep it simple for biologists (eg: avoid complicated analyses or packages)
  4. Start from an empty directory
  5. Get the data with wget
  6. Use R (most familiar language for biologists where I work)
  7. Use RStudio to make it more palatable than my green on black terminal :)
  8. Use a bash script (or a VERY simple make script) to run everything
  9. Create all the output dynamically (figures, tables...)
  10. Use GitHub to version control everything, including the data (smallish text file)
  11. Test the whole pipeline on another data set to show how much time this approach can save

Do you have any suggestions for such a talk?

Thanks in advance for your input

open-science reproducible-science computing Forum • 2.1k views
ADD COMMENT
3
Entering edit mode
7.1 years ago

Here is a simple and short illustration using a Makefile: it downloads some DNA, extracts the protein accessions, downloads the protein, transforms it to SVG, convert to PNG and merge everything in a latex document:

 

 

ADD COMMENT
1
Entering edit mode

Thank you Pierre. It's definitely a cool example, but from the point of view of a biologist with basic notions of R, I don't think we can pretend this is even remotely simple. I am trying to lure them in, not scare them off ;)

ADD REPLY
3
Entering edit mode
7.1 years ago
David Fredman ★ 1.1k

I would definitely suggest using RStudio with RMarkdown (and knitr) for your reproducible analysis example.
Do data download (within R), compute, explanation, figures and tables all within the same document. Some sources you could borrow from:

Software Carpentry R materials

Karl Broman's course materials on reproducible science (or shorter knitr in a knutshell)

The JHU Coursera data science courses
 

 

ADD COMMENT
0
Entering edit mode

I like the idea! Do you have a link to a blog post that would have done something similar to what I am planning?

ADD REPLY
0
Entering edit mode

How about running through a differential expression analysis based on the DESeq2 vignette, with a read count matrix as input?

To showcase reuseability, re-run the analysis a second time using a different count matrix and design matrix but identical analysis/plotting code.

Read count matrices for various datasets are either built into R packages (e.g. Pasilla), or you could download them from e.g. the recount project

Edit: Actually, there's ongoing work in the Software Carpentry project to create a "capstone" showcase just like this to round off the novice lessons. See here for the Rmd and data (and, when more finished, the Software Carpentry main bc repo under novice/r/capstones).

ADD REPLY
1
Entering edit mode
7.1 years ago
donfreed ★ 1.5k

One example might be analysis to find the coverage over a gene or a transcript with RNA-seq data.

The analysis and the question are relatively simple yet slightly differing methods may easily lead to drastically different results.

ADD COMMENT

Login before adding your answer.

Traffic: 1293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6