This error correction problem caught my eye. Is there an O(n) algorithm for this, where n is the number of reads? There's the relatively straightforward O(n2) method of checking all the pairs, but I'm wondering if there's something better out there.
k-mer based error correction is much faster than overlap based methods. The complexity is effectively O(N) where N is the total number of bases. See the quake paper for a representative approach. Many other programs use a similar approach.