Recently I have been trying to improve a genome assembly. It is a plant genome. It was first assembled using 454 data. And again assembled using Illumina data.
I tried to do the job using two strategies. The first one is to work from the beginning by mixing raw reads of both types using de novo assemblers like Velvet and Ray. I call this the direct hybrid assembly. But i also tried to further combine assemblies by both assemblers using a third assembler.
The second one is to assembly 454 reads using Newbler (i.e., GS de novo assembler) and then assemble Illumina reads using Velvet. Then the assemblies were hybridized using a third assembler. I called this the stepwise hybrid assembly approach.
I found that the first strategies produced more wrong assemblies (assessed through comparing scaffolds to protein sequences) than the second one.
I also found that when i further combined the two assemblies produced by the two assemblers (one is better than the other assembler based on my assessment) in strategy one, even more erroneous assemblies were produced.
Could anyone help to suggest potential reasons for this?