I'm currently using samtools.pl to generate consensus and quality from a resequencing effort. Like so many other Fastq parsers, I made the assumption of a four-line format, but samtools produces a fastq format where sequence and quality runs over multiple lines.
I could of course extend the parser to accomodate this format, but this seems pretty hard to get correct, since quality information may contain both + and @ as valid characters.
As I see it, the correct way to do it is to read sequence until a line with either a single '+' OR a '+' followed by the same read name as was used in the '@' line. And then read the same number of quality values.
I think this might work, but as there are multiple other Fastq parser implementation, I'm curious how these deal with this issue?