I was thinking about challenges in dealing with Illumina mate pairs. It is known that after the sequencing run, the final reads are a mixture between true mate pairs, paired-end reads and seemingly single ended reads.
Let's assume a mate pair library with a ~3kb average sequenced insert. If we were to denovo assemble the genome, we could select all contigs above say 20kb. We can then trim 3kb from left and right of each contig and map our mate pair reads to those trimmed contigs. This would allow us to identify which reads are paired-end (based on relative orientation of the reads and the distance between pairs), and exclude them from our sequencing file in the next denovo assembly.
I can already think of one way this can back fire, namely repetitive regions, where we will high coverage after mapping. Maybe these regions can somehow be excluded/masked, and the focus can be on regions of average coverage.