It is hard to estimate how much time is lost and how much frustration results from trying to use software written by others to conduct your research. Unfortunately, there is little investment in teaching even the practical basics of software design in academic science since there is such intense competition over resources and so few incentives to actually write good software.
I've spent the last week or so brainstorming and putting together a tutorial that gives scientists with beginning-level programming skills a gentle introduction to command-line interface design. The tutorial is available on figshare[1], and the LaTeX markup and source code examples are available on BitBucket[2]. My hope is that a better, broader understanding of these principles will benefit the community as a whole.
I would like to solicit the feedback of the BioStar community regarding this tutorial: its content, its style, and its visual presentation. Although I am pleased with what I have come up with, I know it is far from perfect and hope to benefit from everyone's collective expertise.
UPDATE 2013-03-04: Thanks for everyone's excellent feedback. I have posted a new revision on figshare that addresses quite a few points raised in this thread and integrates several suggestions you have made. I plan to continue refining this document, both for my personal use with colleagues and students as well as for community contribution. Please feel free to make additional suggestions for improvement or to fork your own copy and submit a pull request.
That is fabulous. Wish more developers can follow the standard design. A couple of minor comments: 1) Your programs in section 5 do better than the example in section 4 - programs show the type of value each option takes. 2) It would be good to point out that "-xyz123" may mean that "-x -y -z 123" if x/y takes no value. 3) Would be good to point out that on some OS, all options must be put before the mandatory/positional arguments. 4) Sometimes you may want to print help when no positional arguments are provided AND no data are fed into STDIN. In perl, we can
die("Usage: prog.pl [var.vcf] or cat var.vcf | prog.pl") if (@ARGV==0&&-t STDIN)
. The C equivalent is isatty(). Don't know in other languages.Nice. Here is the official POSIX guide-lines for CLIs for reference. http://pubs.opengroup.org/onlinepubs/009696699/basedefs/xbd_chap12.html
Really nice document.
In perl I always use Pod::Usage together with GetOpt::Long. It forces you to document the code well plus gives you lots control over your usage output via pod2usage() without having to handroll it every time. I'd strongly recommend it. I really miss this feature in python.
Below is an example:
Something I've always wondered, but never gotten a clear answer: are the empty lines really necessary before/after POD section markers? And if so, is it required by Perl interpreters (e.g. for the script to run), or by the POD parser (e.g. to produce documentation)? Because I've been using them as an easy way to write multi-line comments, but I've often skipped the extra empty lines, like so:
Will this cause problems on any common Perl interpreters and/or POD parsers?
I've not done exgtensive tests, but when first using POD I found that spaces were needed for correct parsing. Now, I have template code (in Eclipse) which gives me the skeleton with required sections properly spaced. So, no thought required :)
Hmm, well that's good to know about POD, though I'm mostly concerned with whether it ever poses a problem to interpreters. Luckily I haven't had a problem yet, though apparently POD syntax is esoteric enough that lots of syntax highlighting systems don't recognize it at all (empty lines or no), as you can see above.