Hey guys,
I need some help improving a de novo assembly of a prokaryote genome made using Ion Torrent reads (mean length = 230 bp, mean quality Q = 29). The assembly was originally made with MIRA 4, resulting in 519 contigs (~ 7.8Mb). The DNA sample was contaminated with a symbiont prokaryote, so the 7.8Mb corresponded to 3Mb of my genome of interest and the rest to the contaminant genome. I was able to filter out the contigs form the contaminant genome using BLAST of all contigs against a closely related species, with a fully sequenced genome. In the end, I got a "final" assembly of 165 contigs with a total length of 2,85 Mb (N50 27895 bp).
I would like improve the assembly by increasing the N50 value or extending the contigs in someway, but I am stuck now. This is my (not so much) progress so far:
1. I mapped the Ion Torrent reads to the contaminant genome using bwa mem (all default parameters) and used samtools to get only the unmapped reads (the reads that would belong to the genome I want to assemble)
2. Used SPADES to reassemble the unmapped reads using the 165 contigs as trusted contigs for gap closure, repeat resolution and graph construction (--trusted-contigs
option)
However, SPADES keeps crashing when trying to assemble the unmapped reads. I believe the read coverage is too low now to do anything. Using all the reads from the sequencing during the assembly with SPADES generates 8447 contigs, which is not really an improvement from the first MIRA assembly.
So, after this long explanation of my problems, here comes the question:
What do you guys usually do to improve a primary assembly in a situation like this? I am looking for a tool that could be used to use the reads to extend the contigs or scaffold them, or any sort of strategy that could be used in this case. I am trying to use the SPADES error correction tool to improve the quality of the reads so I can remap then to the contaminant genome hoping to have more unmapped reads to redo the SPADES assembly step. Would the error correction be a good strategy in this case? Also, is BWA a good mapping tool for Ion Torrent reads?
Sorry if the questions seem dumb, I am a newbie in the genome assembly world. At least now I don't have to pray for the PCR gods to make something work.