Question: Reproducible Research: A More General Sweave?
9
gravatar for David Quigley
9.0 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

I need to make my work as reproducible as is reasonably possible. A reader should see my code, English-language descriptions of what I did and why I did it, embedded graphs, and tables of results. The software side of my work involves a series of R commands, short Python scripts (typically one-off, shorter than 100 lines), and shell commands to run software packages I've created. The code for these larger programs lives in source control. The standard tool for solving this problem in an R-only world is Sweave (described http://www.stat.umn.edu/~charlie/Sweave/). Sweave documents are literate programming, which is great. However, Sweave doesn't (to my knowledge) solve my problem because I don't live exclusively in R.

My current method is to create a Word document that is formatted with monotype fonts for code blocks and nicer fonts for descriptions, headlines, etc. This puts the entire analysis in one place and looks good. I can drop in Endnote references, paste in screen captures, and paste in Excel tables. This method is a very fast way to work. Importantly, I can email this document to non-technical collaborators and they can read it easily. However, the document cannot be "run in place". To redo my results would involve a lot of cutting and pasting. I also need a Mac or Windows box to work this way.

Has anyone got a better solution than this? "Generate HTML documents with blogging software" would possibly be acceptable if it were easy enough to do and powerful enough.

• 5.2k views
ADD COMMENTlink modified 3.7 years ago by Elke Schaper60 • written 9.0 years ago by David Quigley11k
9
gravatar for iw9oel_ad
9.0 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

If you use Emacs, you may be interested in org-mode and org-babel. Org-mode allows structured note-taking, project planning and authoring in a simple, plain-text markup. You can export in a variety for formats, including HTML, LaTeX and Docbook.

Babel allows you to insert and execute code from about 20 languages. The return value of a function written in one language can be passed to a function written in another. You can also capture output in org-mode tables.

The documentation is fantastic. Being plain-text, it plays well with version control.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by iw9oel_ad6.0k
8
gravatar for Paulo Nuin
9.0 years ago by
Paulo Nuin3.7k
Canada
Paulo Nuin3.7k wrote:

For your Python scripts use Sphinx, it's better than Doxygen for Python code. For your R commands/scripts/code you can use Sweave or maybe even Sphinx.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Paulo Nuin3.7k

+1 for Sphinx. It's based on restructuredText (reST). Extremely powerful: our lab uses it for code documentation and for lab notebooks. Works beautifully and has a very nice indexing feature. Check out the matplotlib website for a swell example.

ADD REPLYlink written 9.0 years ago by Aaronquinlan11k

I think

.. code-block:: r

Will do some of what you want if you're mixing code. And you could perhaps do file includes in conjunction with this.

ADD REPLYlink modified 10 months ago by RamRS22k • written 8.4 years ago by User 85940
6
gravatar for Cassj
9.0 years ago by
Cassj1.3k
London
Cassj1.3k wrote:

I'd be tempted to split your problem up a bit.

  1. Re-running of workflows

    I tend to use Rake for this. Your assorted scripts and shell commands become a series of Rake tasks with appropriate dependencies defined in a Rakefile. There are lots of other tools for this, I'm sure people will recommend other stuff if you don't like Rake.

  2. Report Generation

    If you're using Sweave anyway, you must be familiar with Latex, so why not ditch Word and use Latex to pull all your content together, use Bibtex instead of Endnote and maybe the listings package for displaying code. When you regenerate content (images, code, whatever) just regenerate your latex file (you could make a Rake task for it). As Latex is just text files, you can version control it along with your scripts. As other people have already pointed out, code documentation tools like Doxygen and Sphinx will kick out Latex for you too.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Cassj1.3k
3
gravatar for Casbon
9.0 years ago by
Casbon3.2k
Casbon3.2k wrote:

(disclaimer: I'm a dev on this project)

Codenode aims to do exactly this. At the moment, I have implemented python, R and javascript notebooks and I hope to add Haskell soon.

All the other answers have pointed you towards documentation generators, but the point of codenode is a notebook contains a mixture of code, output (text/images) and written words - saved and shareable like a google doc. This way both input and output are stored together.

I intend to get a live install of codenode ASAP. Once that happens it should be possible to create a bioinformatics environment. In the mean time, come and help if you can!

It is also possible to install on your machine and use in desktop mode.

This screenshot demonstrates markdown/latex in the browser.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Casbon3.2k

Another pic of code input/evaluation and text:

alt text

ADD REPLYlink modified 10 months ago by RamRS22k • written 9.0 years ago by Casbon3.2k

This looks promising; I'll keep an eye on your project. The simpy screen shot looks a bit like a Mathematica file, which is a good thing. Good luck.

ADD REPLYlink written 9.0 years ago by David Quigley11k
2
gravatar for Michael Schubert
9.0 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

Have you already taken a look at Doxygen? You basically just write documented sourcecode with comments in a certain format and it creates the documentation files for you, eg.:

 /**
 * <A short one line description>
 *
 * <Longer description>
 * <May span multiple lines or paragraphs as needed>
 *
 * @param  Description of method's or function's input parameter
 * @param  ...
 * @return Description of the return value
 */

The advantages would be:

  • Language support: C (and variants), Java, PHP and Python
  • R support via an external filter
  • Output as HTML, RTF, PDF, man pages, LaTeX, ..
  • Creation of OO-graphs possible

See also the Wikipedia page and comparison of doc generators.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Michael Schubert6.9k
2
gravatar for Will
9.0 years ago by
Will4.5k
United States
Will4.5k wrote:

While this only exchanges a dependency on R for a dependency in Matlab ... I've had really good success with the "Publish Code" function in matlab. Its very good at generating numerous output types including HTML, pdf and .doc files.

Doug Hellmann (of python MOTW fame) as a wonderful blog-post on how he integrates code and reST to create "functioning" blog posts/reports. You can check out the link here. But essentially he describes a combination of Sphinx and COG.

Hope that helps.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Will4.5k
2
gravatar for Eric T.
4.7 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

A little late here, but I think the solution you're looking for is IPython Notebook. It supports multiple languages, even within a single session; sections of the analysis can be rerun easily; pandas dataframes can be displayed nicely as tables, and plots are embedded in the page as images.

If you mostly live in R, the combination of R Markdown and R Studio might also suit your needs by transforming scripts and interactive sessions into notebooks.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 4.7 years ago by Eric T.2.5k

I agree. I've transitioned to R Markdown, using RStudio to build the documents.

ADD REPLYlink written 4.7 years ago by David Quigley11k
1
gravatar for Mndoci
9.0 years ago by
Mndoci1.2k
Issaquah, WA
Mndoci1.2k wrote:

If you're living in the Ruby world, Glyph might be useful as well. It's a pretty powerful system for documenting your code.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.0 years ago by Mndoci1.2k
0
gravatar for Elke Schaper
3.7 years ago by
Elke Schaper60
Switzerland
Elke Schaper60 wrote:

It seems the world of reproducible research is moving fast...

The Python solution these days is Jupyter Notebook (formerly named IPython Notebook), with support for code chunks in 40+ other programming languages.

The most flexible R solution is Knitr, with support for 10 other programming languages. Knitr renders R Markdown files to either .html, .pdf, or .docx documents.

ADD COMMENTlink modified 10 months ago by RamRS22k • written 3.7 years ago by Elke Schaper60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour