Reproducible Research: A More General Sweave?
9
9
Entering edit mode
12.7 years ago

I need to make my work as reproducible as is reasonably possible. A reader should see my code, English-language descriptions of what I did and why I did it, embedded graphs, and tables of results. The software side of my work involves a series of R commands, short Python scripts (typically one-off, shorter than 100 lines), and shell commands to run software packages I've created. The code for these larger programs lives in source control. The standard tool for solving this problem in an R-only world is Sweave (described http://www.stat.umn.edu/~charlie/Sweave/). Sweave documents are literate programming, which is great. However, Sweave doesn't (to my knowledge) solve my problem because I don't live exclusively in R.

My current method is to create a Word document that is formatted with monotype fonts for code blocks and nicer fonts for descriptions, headlines, etc. This puts the entire analysis in one place and looks good. I can drop in Endnote references, paste in screen captures, and paste in Excel tables. This method is a very fast way to work. Importantly, I can email this document to non-technical collaborators and they can read it easily. However, the document cannot be "run in place". To redo my results would involve a lot of cutting and pasting. I also need a Mac or Windows box to work this way.

Has anyone got a better solution than this? "Generate HTML documents with blogging software" would possibly be acceptable if it were easy enough to do and powerful enough.

• 6.7k views
9
Entering edit mode
12.7 years ago

If you use Emacs, you may be interested in org-mode and org-babel. Org-mode allows structured note-taking, project planning and authoring in a simple, plain-text markup. You can export in a variety for formats, including HTML, LaTeX and Docbook.

Babel allows you to insert and execute code from about 20 languages. The return value of a function written in one language can be passed to a function written in another. You can also capture output in org-mode tables.

The documentation is fantastic. Being plain-text, it plays well with version control.

8
Entering edit mode
12.7 years ago
Paulo Nuin ★ 3.7k

For your Python scripts use Sphinx, it's better than Doxygen for Python code. For your R commands/scripts/code you can use Sweave or maybe even Sphinx.

0
Entering edit mode

+1 for Sphinx. It's based on restructuredText (reST). Extremely powerful: our lab uses it for code documentation and for lab notebooks. Works beautifully and has a very nice indexing feature. Check out the matplotlib website for a swell example.

0
Entering edit mode

I think

.. code-block:: r

Will do some of what you want if you're mixing code. And you could perhaps do file includes in conjunction with this.

6
Entering edit mode
12.7 years ago
Cassj ★ 1.3k

I'd be tempted to split your problem up a bit.

1. Re-running of workflows

I tend to use Rake for this. Your assorted scripts and shell commands become a series of Rake tasks with appropriate dependencies defined in a Rakefile. There are lots of other tools for this, I'm sure people will recommend other stuff if you don't like Rake.

2. Report Generation

If you're using Sweave anyway, you must be familiar with Latex, so why not ditch Word and use Latex to pull all your content together, use Bibtex instead of Endnote and maybe the listings package for displaying code. When you regenerate content (images, code, whatever) just regenerate your latex file (you could make a Rake task for it). As Latex is just text files, you can version control it along with your scripts. As other people have already pointed out, code documentation tools like Doxygen and Sphinx will kick out Latex for you too.

3
Entering edit mode
12.7 years ago
Casbon ★ 3.2k

(disclaimer: I'm a dev on this project)

Codenode aims to do exactly this. At the moment, I have implemented python, R and javascript notebooks and I hope to add Haskell soon.

All the other answers have pointed you towards documentation generators, but the point of codenode is a notebook contains a mixture of code, output (text/images) and written words - saved and shareable like a google doc. This way both input and output are stored together.

I intend to get a live install of codenode ASAP. Once that happens it should be possible to create a bioinformatics environment. In the mean time, come and help if you can!

It is also possible to install on your machine and use in desktop mode.

This screenshot demonstrates markdown/latex in the browser.

0
Entering edit mode

Another pic of code input/evaluation and text:

0
Entering edit mode

This looks promising; I'll keep an eye on your project. The simpy screen shot looks a bit like a Mathematica file, which is a good thing. Good luck.

2
Entering edit mode
12.7 years ago

Have you already taken a look at Doxygen? You basically just write documented sourcecode with comments in a certain format and it creates the documentation files for you, eg.:

 /**
* <A short one line description>
*
* <Longer description>
* <May span multiple lines or paragraphs as needed>
*
* @param  Description of method's or function's input parameter
* @param  ...
* @return Description of the return value
*/


• Language support: C (and variants), Java, PHP and Python
• R support via an external filter
• Output as HTML, RTF, PDF, man pages, LaTeX, ..
• Creation of OO-graphs possible

2
Entering edit mode
12.7 years ago
Will 4.5k

While this only exchanges a dependency on R for a dependency in Matlab ... I've had really good success with the "Publish Code" function in matlab. Its very good at generating numerous output types including HTML, pdf and .doc files.

Doug Hellmann (of python MOTW fame) as a wonderful blog-post on how he integrates code and reST to create "functioning" blog posts/reports. You can check out the link here. But essentially he describes a combination of Sphinx and COG.

Hope that helps.

2
Entering edit mode
8.3 years ago
Eric T. ★ 2.8k

A little late here, but I think the solution you're looking for is IPython Notebook. It supports multiple languages, even within a single session; sections of the analysis can be rerun easily; pandas dataframes can be displayed nicely as tables, and plots are embedded in the page as images.

If you mostly live in R, the combination of R Markdown and R Studio might also suit your needs by transforming scripts and interactive sessions into notebooks.

0
Entering edit mode

I agree. I've transitioned to R Markdown, using RStudio to build the documents.

1
Entering edit mode
12.7 years ago
Mndoci ★ 1.2k

If you're living in the Ruby world, Glyph might be useful as well. It's a pretty powerful system for documenting your code.

0
Entering edit mode
7.4 years ago
Elke Schaper ▴ 100

It seems the world of reproducible research is moving fast...

The Python solution these days is Jupyter Notebook (formerly named IPython Notebook), with support for code chunks in 40+ other programming languages.

The most flexible R solution is Knitr, with support for 10 other programming languages. Knitr renders R Markdown files to either .html, .pdf, or .docx documents.