Question

bacterial genome assembly output from canu

0

Entering edit mode

3.3 years ago

rthapa ▴ 90

Hi,

I have done de novo genome assembly of a bacterial strain using canu. I want to find structural variants comparing with the reference genome. Since, the bacterial genome is circular, It is hard to find the origin of replication to align with the reference genome. Does anyone have suggestions how to find structural variants in the bacterial genome comparing with the reference genome?

Thanks

bacteria genome assembly canu • 1.3k views

ADD COMMENT • link updated 3.3 years ago by juanjo75es ▴ 130 • written 3.3 years ago by rthapa ▴ 90

0

Entering edit mode

This question has been asked a couple of different ways: de novo genome assembly of bacterial genome

rthapa : Have you tried to repeat the assembly? Perhaps you will get an assembly that will be co-linear with the reference.

ADD REPLY • link 3.3 years ago by GenoMax 141k

0

Entering edit mode

Yes, I tried to repeat the assembly after removing read lengths shorter than 2000 bp. The assembly result is similar. I think the issue is due to incorrect identification of origin of replication during genome assembly.

ADD REPLY • link 3.3 years ago by rthapa ▴ 90

1

Entering edit mode

Where is the dnaA gene located in your assembly and where is it in the reference? Paper1 and Paper2.

ADD REPLY • link 3.3 years ago by GenoMax 141k

score 1 · Answer 1 · 2020-12-22

There are many ways to do this. I suggest to do the following:
1) Align your genome to the reference genome using pairwise megablast
2) From the alignment you'll be able to find the position of your genome that corresponds to the first position in the reference. Then change your genome sequence, so now it starts from this position.
3) Align your genome to the reference genome again and look at the dotplot. You'll be able to see structural differences on the dotplot.

score 0 · Answer 2 · 2021-01-02

I think Quast is a good tool for that. It already aligns the assembly to the reference independently of any issue with circularity. I think it's also useful to get two different assemblies with two different software. Sometimes it's just the assembler that fails.

Here you have a likely real rearrangement verified by three different algorithms (SPAdes, rnaSPAdes and Contignant s-aligner):

Alignment of reads obtained with SPAdes, rnaSPAdes and Contignant s-aligner

Here you have a false rearrangement detected with SPAdes

Alignment of reads obtained with SPAdes