9.1 years ago by
I have found that makefiles (yes, the ones you use for compiling software) are a powerful way to at the same time automate and document what you do. The basic syntax of makefiles in my mind fits very well with what you would like to capture for documentation purposes:
the_file_you_made: an_input_file another_input_file
some_command --important-option $< > $@
In other words you write down which files you make, from which other files, by running which commands. Best of all, you write it down in a computer-readable from, which allows you to easily rerun everything that needs to rerun if you discover a mistake in some step.
Better yet, the wildcard options allows you to automate all the bread-and-butter bioinformatics tasks:
# Convert Genbank file to FASTA format.
gbk2fasta $< > $@
I find that the latter way of using makefiles have several desirable effects. Firstly, it gives people a motivation to actually use makefiles (and as a nice side effect document what they did). Secondly, such rules rely heavily on how you name files, which encourages people to standardize how they name files.
I have used this on many projects over the years, starting back when I did my M.Sc. thesis. A large part of the project was to analyze the S. cerevisiae genome (which still quite new at the time). Imagine downloading a new, improved assembly and annotation of the genome in Genbank format, typing
make, and have every analysis redone based on your "lab notebook". That was what I did just one month before handing in my report :-)