Hi all, I'm wrapping up a Nanopore-Illumina hybrid genome assembly. I've gone through all the Canu/Nanpolish/Pilon steps, and decided to pick at what is likely my mitochondrial genome. I want to trim it down to avoid repeats due to the circular nature of the DNA, and make sure my current contig indeed represents a circular sequence. The mito seq seems to be on the long end, in the neighborhood of 300kb.
I've done two things:
- self-aligned with MUMmer to look for likely repeat/loop points
- run the contig through the tool Circlator (reassembles with Canu corrected reads to make a circular contig against the reference from the orginal assembly)
Both methods agree well, but now I'm have a little trouble properly selecting the final sequence I want to continue with. One could take the Circlator output, which is based on Canu corrected reads, and run all the polishing again (Nanopolish and Pilon x3 times).
I'd rather not, so I've aligned the Circlator re-assembly to my original and aim to take the corresponding sequence of the original, polished assembly. But there are some funny gaps that make it a little tricky to determine where the best cut offs are.
Easiest way to see it is a picture.
- The original assembly is duplicated to give it wrap around for the alignment.
- The small gray bar annotations are a 2.5kb repeated region identified by MUMmer. By eye, I would simply extract the sequence from the beginning to the start of the second bar, and call it a day. These repeats are not so far from areas that blast to mitochondrial ribo SSU.
- Interestingly the Circlator assembly tends to break itself up using bits from what I presume are the repeated regions spread over the contig. Much of the aligned region is peppered with little gaps (yellowish area of consensus)
Thoughts on how to best select a sequence to stick with? The lazy part of me is inclined to say take the first hunk of the original assembly between the first pair of repeats, and call Circlator a good piece of evidence that this is basically correct.