Question: Correct alignmnent region from Circlator output mapped to orignal assembly
Hi all, I'm wrapping up a Nanopore-Illumina hybrid genome assembly. I've gone through all the Canu/Nanpolish/Pilon steps, and decided to pick at what is likely my mitochondrial genome. I want to trim it down to avoid repeats due to the circular nature of the DNA, and make sure my current contig indeed represents a circular sequence. The mito seq seems to be on the long end, in the neighborhood of 300kb.

I've done two things:

  • self-aligned with MUMmer to look for likely repeat/loop points
  • run the contig through the tool Circlator (reassembles with Canu corrected reads to make a circular contig against the reference from the orginal assembly)

Both methods agree well, but now I'm have a little trouble properly selecting the final sequence I want to continue with. One could take the Circlator output, which is based on Canu corrected reads, and run all the polishing again (Nanopolish and Pilon x3 times).

I'd rather not, so I've aligned the Circlator re-assembly to my original and aim to take the corresponding sequence of the original, polished assembly. But there are some funny gaps that make it a little tricky to determine where the best cut offs are.

Easiest way to see it is a picture.

  • The original assembly is duplicated to give it wrap around for the alignment.
  • The small gray bar annotations are a 2.5kb repeated region identified by MUMmer. By eye, I would simply extract the sequence from the beginning to the start of the second bar, and call it a day. These repeats are not so far from areas that blast to mitochondrial ribo SSU.
  • Interestingly the Circlator assembly tends to break itself up using bits from what I presume are the repeated regions spread over the contig. Much of the aligned region is peppered with little gaps (yellowish area of consensus)

LASTZ alignment Circlator (y) against original (x)

Thoughts on how to best select a sequence to stick with? The lazy part of me is inclined to say take the first hunk of the original assembly between the first pair of repeats, and call Circlator a good piece of evidence that this is basically correct.

Thank you!

And I am also trying to align these sequences with MAFFT as well, I wonder if LASTZ is a maybe a little too coarse or inappropriate for this, since my understanding that it is mainly for rapidly aligning whole genomes. My sequences are large, but certainly not genome size. I'm still waiting on MAFFT to chug along though.

Google is asking for a password to see the image. Maybe you could upload it to ImBB or other similar service?

Oh, thanks for telling me. Rather, here's an updated slide I used to share this with someone else - I think I'm fairly satisfied with the call I made. But am happy to hear opinions. This new one is based on a MAFFT alignment, rather than the genome aligner LASTZ.

MAFFT & final sequence selected This link seems to a bit wonky still ... try this:

MAFFT & final sequence selected This link seems to a bit wonky still ... try this:
