Testing is a relatively new concept where you tell your program that it should check it does what you expected it to do. For example in python
a = 1
b = 2
assert a+b == 3
Despite current wide support for unit tests, there has been little evidence that tests actually prevent unexpected outcomes from programs in real world situations. A lot of development shops do it because it gives the feeling of safety/productivity, when you hit test and all those little green arrows get ticked off. That isn't to say they have no value - any tool that forces a developer to take a look at what they expect and what is really happening will result in a better program. Testing is just one very basic and commonly not-properly-implemented way to do that. Type hints, are another. Code review, another. Closely following an architectural design pattern another.
But testing doesn't work well for file parsers, or most things that work with data, because file parsers do not have a defined input/output. Or rather, the range of potential inputs is unlimited, and thus the number of tests must be unlimited for a "successful test" to mean what developers think it means - the code works as expected. A parser can beat a million tests, but still fail when something unexpected appears in user data. This is because the developer didn't write the data. Actually it goes one step further than that - a sufficiently complex parser can end up being turin complete, and thus data can program the parser to perform any calculation the data wishes. The data controls the program. It is impossible to write tests for a program you do not control.
There are even more modern, even more highly recommended, and even more not-properly-implemented options a developer could take. For example LangSec is a movement dedicated to formally defining what makes a bad file format for the point of security, although security has only been chosen because it's the most practical reason not to write bad file formats. But the recommendations of LangSec apply to writing testable code too. This video seems to cover the major topics in the first 5 minutes:
So, if writing testable code for file parsers depends on the file format itself - then "how do we guarantee software reliability in bioinformatics?" is like asking how does one hold ice without melting it. You cannot write a FASTA parser that is testable, because your definition of FASTA is not the same as someone else's. What does a tool do when it encounters a FASTA file with a ">" as the 121st character of the sequence? Split it onto a new line as the spec says rows must be wrapped to 120 lines, resulting in a new entry? Report an error that ">" shouldn't be in the sequence, even though the spec says it should be ignored? Wrap the line to 80 characters (which is also valid FASTA)? Not perform any wrapping at all (as many consider linearised FASTA to be valid FASTA these days).
And FASTA is one of the simpler formats! - try writing a parser for BAM, VCF, etc. I'm not sure it's actually possible. ValidateSam file tried, and it flags pretty much everything as invalid :P And my definition of invalid is probably different from yours, is different from the specification, etc.
Long story short, writing reliable software that has to parse unreliable data formats is fundamentally impossible. Unit-tests will not save us :P
20 months ago by
John ♦ 12k