bacterial genome assembly output from canu
2
0
Entering edit mode
11 months ago
rthapa ▴ 50

Hi,

I have done de novo genome assembly of a bacterial strain using canu. I want to find structural variants comparing with the reference genome. Since, the bacterial genome is circular, It is hard to find the origin of replication to align with the reference genome. Does anyone have suggestions how to find structural variants in the bacterial genome comparing with the reference genome?

Thanks

bacteria genome assembly canu • 601 views
0
Entering edit mode

This question has been asked a couple of different ways: de novo genome assembly of bacterial genome

rthapa : Have you tried to repeat the assembly? Perhaps you will get an assembly that will be co-linear with the reference.

0
Entering edit mode

Yes, I tried to repeat the assembly after removing read lengths shorter than 2000 bp. The assembly result is similar. I think the issue is due to incorrect identification of origin of replication during genome assembly.

1
Entering edit mode

Where is the dnaA gene located in your assembly and where is it in the reference? Paper1 and Paper2.

1
Entering edit mode
11 months ago
shelkmike ▴ 550

There are many ways to do this. I suggest to do the following:
1) Align your genome to the reference genome using pairwise megablast
2) From the alignment you'll be able to find the position of your genome that corresponds to the first position in the reference. Then change your genome sequence, so now it starts from this position.
3) Align your genome to the reference genome again and look at the dotplot. You'll be able to see structural differences on the dotplot.

0
Entering edit mode
11 months ago
juanjo75es ▴ 130

I think Quast is a good tool for that. It already aligns the assembly to the reference independently of any issue with circularity. I think it's also useful to get two different assemblies with two different software. Sometimes it's just the assembler that fails.

Here you have a likely real rearrangement verified by three different algorithms (SPAdes, rnaSPAdes and Contignant s-aligner):

Here you have a false rearrangement detected with SPAdes