Question

Which Is Your Approach Toward Testing?

18

Entering edit mode

13.9 years ago

Giovanni M Dall'Olio 28k

I have often heard people in bioinformatics complaining about that it is very difficult to write tests for programs and scripts, and that to a certain point it is not useful to do that, as it is impossible to write tests for every possible wrong behaviour of your scripts. If you ask any bioinformatics student, it is very likely that he won't even know what a test is.

However, if you think about it, biologists have had to face the problem of designing tests for a long time before bioinformatics was invented, with an higher level of difficulty, and they came out with some sort of solution. My former Molecular Biology prof taught me that the most difficult part of an experiment is not just to formulate an hypothesis, but to design the right tests and controls to prove that the method is correct.

If you think about it, it is probably impossible to achieve real reproducibility for a wet-lab experiment. Even the same cell line, after some duplications in a lab, can accumulate so many mutations to differ significantly from the same strain in another lab. And there are a lot of variables that make reproducibility really hard to achieve in a wet lab: the climate of the lab, the experience of the researcher, the reagent used... and, even when you have demonstrated a result in a cell line, you can't deduce that the same happens in another line or in vivo.

However, even if they can't eliminate the problem of reproducibility, scientists have been able to find a compromise solution. They have a range of well documented best practices to follow, like putting a positive and negative control in any western blot, calibrating their machines first, using a blind control framework, etc... these practices enable scientists to make experiment that can reasonably be assumed as being reproducible. Moreover, they have an international organization in charge of that.

In bioinformatics, there is not such thing as best practices. If two students write two parsers for the fasta format, there is no way to tell whether their results will be the same, because there is no standard practice they have to follow. It would be better if, for example, they had to follow some guidelines, or if they are required to write parser that can pass a standard set of use-cases of fasta files without giving errors.

So, what is your approach toward testing? Which is your strategy when designing which tests you will have to write for your analysis, and which controls you will be using? Do you have any recommendation that you would give to an inexpert newbie bioinformatician?

• 4.6k views

ADD COMMENT • link updated 13.1 years ago by apfejes ▴ 160 • written 13.9 years ago by Giovanni M Dall'Olio 28k

score 10 · Answer 1 · 2010-05-31

It depends. For some basic algorithm development, you'll still be able to write test cases. But when you go into more exploratory research programming, you don't know what the answer will be, and therefore you can't write tests for it. Here's what I try to do:

Have an idea what to expect (even to be proven wrong).
Look at cases in detail. Check if the results make sense.
Make sure you have a pipeline you can run with minimal effort to facilitate fixing of bugs.
If you notice you are making assumptions, put in assert statements to check if the assumptions do always hold.
Have a negative control, and if possible a positive control. (If you feed random data to your pipeline and still get a good p-value, something is wrong.)

My biggest nightmare are actually scripts that fail on a cluster, but produce valid (if incomplete) output, and I haven't found a good way to test for that.

score 10 · Answer 2 · 2010-05-31

10

Entering edit mode

13.9 years ago

Will 4.5k

I write all of my code in python and then use nose-test to run automated unit tests. In general I tend to have ~95% branch coverage (the new coverage.py can estimate branch coverage). I avoid writing "quick-scripts" since they are nearly impossible to test, document and maintain.

I make it a point to write small and testable functions ... if I can't think of a unit-test for a function then I don't write it. I don't do TDD (just because it doesn't fit my personal coding style) but I try to make sure I have a high level of coverage before I ever run a real analysis with the code.

I also make sure to create example datasets where the answer is obvious. Something similar to a "functional test".

But the main message is: Test, Test and then test some more. How can you know your code is right, and therefore your analysis, unless you've run some sort of test. When I review computational papers the first thing I look for is tests ... if they don't have any sort of conformational tests then I stop reading.

Hope that helps,

Will

ADD COMMENT • link 13.9 years ago by Will 4.5k

0

Entering edit mode

that's some pretty rigorous testing! do you have some public code so i can learn something?

and when reviewing, how often do you actually see papers that include tests?

ADD REPLY • link 13.9 years ago by brentp 24k

0

Entering edit mode

I've only come across 1 paper with unit-tests out of ~50. Since they were simulation papers most had a section in their paper about "validation with simulated data" ... kinda like a functional test.

ADD REPLY • link 13.9 years ago by Will 4.5k

0

Entering edit mode

Hi Will. I'm diving into unit testing and am using Python's 'unittest' module. What could influence me to move to 'nose'? Cheers

ADD REPLY • link 13.2 years ago by Eric Normandeau 11k

0

Entering edit mode

Hi Will. I'm diving into unit testing and am using Python's 'unittest' module. What could influence me to move to 'nose'? As I understand it, it is basically a higher level wrapping of 'unittest'... Cheers!

ADD REPLY • link 13.2 years ago by Eric Normandeau 11k

score 7 · Answer 3 · 2010-05-31

I should do some tests with JUnit, I know, I know. Tests are now in the industry standards....

But... citing StackOverflow, TDD are:

Big time investment
Additional Complexity
Design Impacts
Continuous Tweaking

however, StackOverflow is a nice place for learning about TDD.

UPDATE (4 years later): I've developped a large number of tools using the sam library for Java. A few days ago the library has deeply changed: I'm now crying because many parts of my code need to be changed and I have to check if everything compiles and execute properly.

score 3 · Answer 4 · 2010-10-29

As the one with the deepest standing in biology within our (former, company now gone) computational group, I was often called upon to test code or offer interesting, knotty, complex examples for testing. Knowledge of those weird biological examples helped to test, often breaking the code, pre-release mind you. So, once again and in reference to Giovanni's question, thinking like a biologist will help in some of these situations.

What can a programmer do? One, talk to the biologist in the group about your code, what it does and where/how it should perform. Two, when you hear of a strange gene or odd pathway (KEGG used to be all BM biochem pathways, now has disease pathways) or something different and relevant to your project, take note, collect the details and save it for your testing. In fact, such a repository should be a shared resource within the group.

score 3 · Answer 5 · 2010-10-29

Well, I don't often have the time to write tests. We all know they're important, but there's never a way to invest the time in them and meet the deadlines we're given. It's been a common complaint throughout my grad studies. However, there are things you can do.

First, since I program in Java, I've used Enerjy. It's a free download that integrates into Eclipse. When I first installed it, it found over 2000 warnings in 25kloc... and I painstakingly went through every single one. Surprisingly, if found a lot of things that would have been bugs. It won't stop algorithm fails, but it will prevent coding fails. (eg, recycled variables, unused variables, confusing variables, and bad programming practices.)

Second, I always use a wide sampling of test cases. Genomics work invariably will find things you don't expect to handle, so I often just throw my code at whole genomes, datasets and anything I've got handy. For instance, I often pull down every exon definition from Ensembl and just throw that at my code. I've found lots of errors in Ensembl annotations, but they also find errors in my own code.

Third, double check every single result set you generate! I can't stress that enough. Take a sample from whatever result file you've created and confirm by hand that the result is a consequence of data that's present in the original files. The more you verify, the higher your confidence in your code. If you have the time, these are the things that will become the official smoke tests anyhow.

And finally, get as many people using the code as possible. Everyone will try to do something completely different with your code, most of it will be things you never imagined - and that will identify just about any bug you can imagine. If you can't afford to hire testers, just giving out your code for free will seriously increase the number of bugs that are found.

score 1 · Answer 6 · 2010-10-28

According to this paper:

High assurance development means producing compelling evidence that a system meets specified requirements.

For computer programs, there are several ways to provide this compelling evidence. By increasing cost/benefit, I'd list

an advanced type system: using static typing with type inference gives you a proof of (some) correctness essentially for free. Good programmers will be better at leveraging the type system (see e.g. Brian's answer to this thread at stackoverflow
unit tests: if you can express invariants for your code, a system like quickcheck will automatically generate random test cases for you.
system tests: typically harder to set up, and covers fewer cases, but some are essential to show that the program actually works.

I haven't used formal proofs (beyond type correctness), so I'm unsure about the effort vs the gain. I've seen some test setups that spend a lot of effort testing that all kinds of incorrect inputs produce the right error, but I am (to state it gently) dubious about the benefit of this.

Now, if the question was how do we provide compelling evidence that a particular method works, this is a lot more ad hoc, and typically the subject of a paper presenting the method. Unfortunately, the positive evidence tends to come from people "selling" a product, and negative evidence is harder to come by (see e.g. my question on assembler for eukaryotes), but no less important.