I am conducting de novo assembly of ~33Mb genome using 454 and Illumina reads. I cannot use MIRA, since I have ~80M Illumina reads (would require ~160Gb memory). So far I have found that it's usually most efficient to first assemble reads with Newbler and Velvet, respectively, and then combine the results using some third assembly program. I have been using CAP3 for the last step but I'm not satisfied with the results.
Statistics for the intermediate and final assemblies can be seen below. The problem is that CAP3 results are worse compared to the intermediate ones. It seems that CAP3 throws most of the contigs away. Two questions:
- Should I use some specific options for CAP3 when conducting the final assembly
- Are there any ready-made pipeline for doing this kind of 'integration' more effectively?
Statistics for the CAP3 output:
Number of contigs 826 Total size of contigs 5220088 Longest contig 37928 Mean contig size 6320 Median contig size 3734 N50 contig length 12593 L50 contig count 130
Statistics for Newbler output:
Number of contigs 1942 Total size of contigs 32110351 Longest contig 170575 Mean contig size 16535 Median contig size 8447 N50 contig length 37018 L50 contig count 272
Statistics for Velvet output:
Number of contigs 4939 Total size of contigs 34602711 Longest contig 134827 Mean contig size 7006 Median contig size 3463 N50 contig length 15446 L50 contig count 662