Question

How Do You Handle External Bioinformatics Programs During Testing?

2

Entering edit mode

13.7 years ago

Andrewjgrimm ▴ 460

What approach do you take to using external bioinformatics software during testing of your code?

For example, if one of your components used clustalw, which you have installed on a Linux server but not on your Windows workstation, which approach would you take?

Let your code run the external program - if it doesn't exist, let it die horribly and fail its tests. Advantages: laziest solution. Disadvantages: lots of failing tests can hide other failures that may only appear when you run it on your Windows workstation.
Skip tests that wouldn't pass on your platform. For example, skip the tests for the clustalw-using software when you're running it on your Windows workstation.
Mock the program out somehow?
Something else?

Edit: The software isn't a package that will be used by anyone else. It'll just be used by myself (barring being hit by a bus) for my current project.

The project is written in ruby. I assumed most bioinformaticians use scripting languages unless they have to optimize for speed, but maybe that's not the case.

• 2.6k views

ADD COMMENT • link updated 13.7 years ago by Michael 54k • written 13.7 years ago by Andrewjgrimm ▴ 460

2

Entering edit mode

I don't get your point. If some external program is a dependency for your program, why should it not die and fail the corresponding tests if you attempt to run it on a platform where one of the required dependencies are not present?

ADD REPLY • link 13.7 years ago by Lars Juhl Jensen 11k

1

Entering edit mode

I think this is a quite common question, though not specific to bioinformatics. It applies in general to software configuration an dependency management. So could be asked on Stackoverflow, but I see no need to vote this down.

ADD REPLY • link 13.7 years ago by Michael 54k

0

Entering edit mode

I stick with one platform - Linux - where I can use simple commands such as "which" to check if an executable exists.

ADD REPLY • link 13.7 years ago by Neilfws 49k

0

Entering edit mode

I agree with Lars. If the dependency isn't there, then even later the user won't be able to run it. But, make it optional to compile your package without the component so that the user can still use the rest.

ADD REPLY • link 13.7 years ago by Michael Kuhn 5.0k

0

Entering edit mode

Skipping tests based on certain conditions then reporting them as skipped is a common way to go. Most testing frameworks will support this behavior.

ADD REPLY • link 13.7 years ago by Istvan Albert 100k

score 4 · Answer 1 · 2010-08-20

Ideally, this should be handled during configuration phase, not testing, because that might be too late. If you use something like a configure, make, make check, make install build process, as a user of your program I would like to see the following:

the program is essential, no way the program is going to run without it, have configure fail, or fail within the installer if distributing a binary build, put the dependency in the README file and documentation.
the program is optional, then during configure turn off the features relying on it and skip tests for it

When distributing a binary build as a debian package or rpm, mark the dependency in the package, then it should be resolved automatically.

Edit after seeing yours:

The most important thing is to document dependencies in one or another way even if only "using the code yourself", you can never be sure about using your code alone anyway, you might also leave for good, just a matter of courtesy to leave dependencies documented. And, after half a year or so, my own code looks to me as if written by somebody else ;)

One way to document dependencies is the language specific documentation tools and build mechanisms like package descriptor files, POD docu for perl, javadoc, etc. No idea what that would be for ruby. Possibly writing a little installer script in ruby, that checks if the program can be run would also make your life easier if your program needs to be installed in several environments.

score 4 · Answer 2 · 2010-08-20

The way Biopython handles this in running the tests is to raise a specific error type (MissingExternalDependencyError) and then catch and report it to stderr without failing the test. For instance, the clustalw runner checks for all the various places clustalw could be and then raises the error if it can't find it:

http://github.com/biopython/biopython/blob/master/Tests/test_Clustalw_tool.py

The test runner catches the error, and reports it to stderr:

http://github.com/biopython/biopython/blob/master/Tests/run_tests.py

This is a compromise between failing the test and silently ignoring the problem. As Lars mentioned, this only makes sense if other parts of your package can be utilized even if the external program is missing.

score 1 · Answer 3 · 2010-08-20

Some large packages are tricky to install, i.e. requiring tens of Perl modules, or some specific gcc version, libraries etc. Also it is hard to tell if after getting a ton of new stuff one will use program in question (it may simply not work on your data). Therefore for more hairy installs I am using VirtualBox + Debian. That way I can put inside whatever program developers happen to fancy at the moment without going insane and messing up either my workstation or even some junk box previously used for testing. Things run slower inside VirtualBox, but if you add the installation time to run time, not to mention "untangling the mess on my workstation time", it is still a win for programs not run on daily basis or so.