Tutorial:Designing Command-Line Interfaces (Clis) For Scientific Software
4
18
Entering edit mode
9.9 years ago

It is hard to estimate how much time is lost and how much frustration results from trying to use software written by others to conduct your research. Unfortunately, there is little investment in teaching even the practical basics of software design in academic science since there is such intense competition over resources and so few incentives to actually write good software.

I've spent the last week or so brainstorming and putting together a tutorial that gives scientists with beginning-level programming skills a gentle introduction to command-line interface design. The tutorial is available on figshare[1], and the LaTeX markup and source code examples are available on BitBucket[2]. My hope is that a better, broader understanding of these principles will benefit the community as a whole.

I would like to solicit the feedback of the BioStar community regarding this tutorial: its content, its style, and its visual presentation. Although I am pleased with what I have come up with, I know it is far from perfect and hope to benefit from everyone's collective expertise.

UPDATE 2013-03-04: Thanks for everyone's excellent feedback. I have posted a new revision on figshare that addresses quite a few points raised in this thread and integrates several suggestions you have made. I plan to continue refining this document, both for my personal use with colleagues and students as well as for community contribution. Please feel free to make additional suggestions for improvement or to fork your own copy and submit a pull request.

software tutorial Tutorial • 7.0k views
3
Entering edit mode

That is fabulous. Wish more developers can follow the standard design. A couple of minor comments: 1) Your programs in section 5 do better than the example in section 4 - programs show the type of value each option takes. 2) It would be good to point out that "-xyz123" may mean that "-x -y -z 123" if x/y takes no value. 3) Would be good to point out that on some OS, all options must be put before the mandatory/positional arguments. 4) Sometimes you may want to print help when no positional arguments are provided AND no data are fed into STDIN. In perl, we can die("Usage: prog.pl [var.vcf] or cat var.vcf | prog.pl") if (@ARGV==0&&-t STDIN). The C equivalent is isatty(). Don't know in other languages.

3
Entering edit mode

Nice. Here is the official POSIX guide-lines for CLIs for reference. http://pubs.opengroup.org/onlinepubs/009696699/basedefs/xbd_chap12.html

2
Entering edit mode

Really nice document.

In perl I always use Pod::Usage together with GetOpt::Long. It forces you to document the code well plus gives you lots control over your usage output via pod2usage() without having to handroll it every time. I'd strongly recommend it. I really miss this feature in python.

Below is an example:

#!/usr/bin/perl

script.pl - my test script

script.pl --in <file> --out <file> [--help] [--man] [--version]

=cut

use strict;
use warnings;

use Getopt::Long qw(:config auto_version);
use Pod::Usage;

my $file; my$out = 'out.tsv';
my $help; my$man;
our $VERSION = '0.1'; GetOptions ( 'in=s' => \$file,
'out=s'     => \$out, 'man' => \$man,
'help|?'    => \$help, ) or pod2usage(); pod2usage(-verbose => 2) if ($man);
pod2usage(-verbose => 1) if ($help); pod2usage(-msg => 'Please supply a valid filename.') unless ($file && -s $file);  ADD REPLY 0 Entering edit mode Something I've always wondered, but never gotten a clear answer: are the empty lines really necessary before/after POD section markers? And if so, is it required by Perl interpreters (e.g. for the script to run), or by the POD parser (e.g. to produce documentation)? Because I've been using them as an easy way to write multi-line comments, but I've often skipped the extra empty lines, like so: #!/usr/bin/perl -w =begin comment Blah blah, beginning comment block Overview comments about script =end comment =cut use strict; [etc]  Will this cause problems on any common Perl interpreters and/or POD parsers? ADD REPLY 0 Entering edit mode I've not done exgtensive tests, but when first using POD I found that spaces were needed for correct parsing. Now, I have template code (in Eclipse) which gives me the skeleton with required sections properly spaced. So, no thought required :) ADD REPLY 0 Entering edit mode Hmm, well that's good to know about POD, though I'm mostly concerned with whether it ever poses a problem to interpreters. Luckily I haven't had a problem yet, though apparently POD syntax is esoteric enough that lots of syntax highlighting systems don't recognize it at all (empty lines or no), as you can see above. ADD REPLY 5 Entering edit mode 9.9 years ago I just read quickly but for the python example, optparse is deprecated. Use argparse. ADD COMMENT 2 Entering edit mode Maybe python programmers should use the canonical getopt given how unstable these newer packages are? ADD REPLY 0 Entering edit mode See this PEP : http://www.python.org/dev/peps/pep-0389/ It seems that getopt is pretty stable, but lack of some functionalities provided by optparse then argpase. ADD REPLY 0 Entering edit mode Great! I will update the example and user the getopt module. It may mean adding a few more lines of code, but honestly it is worth it for the control, flexibility, stability, and portability that it provides. ADD REPLY 0 Entering edit mode Thanks for the comment. Somehow I missed that optparse is deprecated. I honestly don't code much in Python and used optparse for this example because I had seen it used once in a colleague's script. ADD REPLY 4 Entering edit mode 9.9 years ago Here is a great command line parser/documentation library for python: https://github.com/docopt/docopt Basically, you write the help message in POSIX format and the library will parse arguments based on the help message. ADD COMMENT 1 Entering edit mode For a standard thing like option parsing, I would prefer to use the core library to drop a 3rd-party library dependency. To me, meeting dependency is usually the worst part when installing a program. ADD REPLY 0 Entering edit mode Yeah thats true. For distribution, I would avoid this or include the docopts.py script in the project folder. ADD REPLY 0 Entering edit mode in my opinion, docopt looks great on the surface, but when you actually get down to using it, it's a pain to get everything out of it and into arguments for functions. ADD REPLY 3 Entering edit mode 9.9 years ago Ido Tamir 5.2k Nice. The abstract is a bit long. I like the code templates. For educational purposes you could still maybe emphasize the need for a small comment in top of the source code about what the program really does - more than usage. You could also add something about the subcommand trend, git being the most prominent example (python >= 2.7 argparse can do it). I add an R-example: #!/usr/bin/env Rscript #example.R: template for parsing command line options library("optparse") option_list <- list( make_option(c("-f", "--filter"), action="store_false", default=FALSE, help="apply strict filtering"), make_option(c("-o", "--out"), type="character", dest="out", default="stdout", help="file to which output will be written; default is terminal (stdout)"), make_option(c("-s", "--strand"), type="integer", default=0, help="strand to search; provide a positive number for the forward strand, a negative number for the reverse strand, or 0 for both strands; default is 0", metavar="number"), make_option(c("-w", "--weight"), type="double", default=0.9, help = "user-defined weight; default is 0.9") ) parser <-OptionParser(usage="Usage: %prog [options] data.txt", option_list=option_list) arguments <- parse_args(parser, positional_arguments=TRUE) opt <- arguments$options

#normally one would try to accumulate errors

if(length(arguments$args) == 0){ input <- file("stdin", "r", blocking=FALSE) }else if(length(arguments$args) == 1){
input <- tryCatch(file(arguments$args, "r", blocking=FALSE), error=function(e){cat(paste("ERROR: ", arguments$args, "not readable\n\n"), file=stderr()); quit(save="no",status=1)})
}else{
cat("Error: can only process one file\n\n", file=stderr())
print_help(parser)
quit(save="no", status=1)
}

if(opt$out == "stdout"){ output <- stdout() }else{ output <- tryCatch(file(opt$out, "w", blocking=FALSE), error=function(e){cat(paste("ERROR: ", opt$out, "not writable\n\n"), file=stderr()); quit(save="no",status=1)}) } while(length(line <- readLines(input,n=1)) > 0) { #process lines mline <- paste(opt$s, opt$w, line) write(mline, output) } if(opt$out != "stdout"){ close(output) }
if(length(arguments\$args) != 0){ close(input) }

0
Entering edit mode

Good suggestions. Regarding the abstract, that was originally written up as an overview/introduction. I changed it to an abstract last minute based on some LaTeX templates I was looking at. I'll consider trimming down the abstract and reintroducing an overview/introduction section. Regarding documenting the program's purpose: excellent suggestion. Regarding subcommands: remember, this is for scientists with beginner-level programming skills. Regarding R example: thank you very much!

1
Entering edit mode
9.9 years ago
Ryan Thompson ★ 3.6k

I'm going to go ahead and plug my personal fill-in-the-blanks Python script template: https://github.com/DarwinAwardWinner/python-script-template/

It uses the plac library, which takes your main function, analyzes its arguments, and generates the command-line interface for you based on the argument list. I include a "hello world" example: https://github.com/DarwinAwardWinner/python-script-template/blob/master/hello.py

0
Entering edit mode

Again, as plac is not a standard python library i would add the same thing lh3 quoted for the docopt library : dependency is bad :(