Question: Nanopore only assembly errors
20 months ago by
karthick.108720 wrote:

We found that Nanopore-only assembly contains more than 30% of indels even after nanopolish. Is there any way to reduce frameshifted INDELS from Nanopore-only assembly without hybrid assembly using Illumina reads?

ADD COMMENTlink written 20 months ago by karthick.108720

Hello karthick !

Which tools did you used to asses % of indels before and after polishing ? And also how much coverage did you used for polishing with Nanopolish ? Just for my own curiosity !



ADD REPLYlink written 20 months ago by Rox1.2k

This sounds too high.

  • Which genome ? Prok/Euk? Size ?
  • How many rounds to nanopolish polishing ?
  • Which assembler was used?
  • Are the indels in reading frames?
ADD REPLYlink written 20 months ago by colindaven2.3k

Hi colindaven 1. This is a prokaryotic genome with the size of around 7.0 M (Pseudomonas aeruginosa; GC content: 65%) 2. After Canu assembly, two rounds of polishing performed 3. Canu 1.7 assembler 4. yes, the indels are in reading frames

ADD REPLYlink modified 20 months ago • written 20 months ago by karthick.108720

Sounds like you have done everything right. There is a lot known about P. aeruginosa and heaps of phylogeny information. If you can't do additional Illumina seq to correct the indels (that number of indels, i.e. 30% is the worst I've heard) maybe you can use public Illumina reads from an almost identical P aeruginosa. This will probably be OK as the core genome is extremely highly conserved with a very low mutation rate (used to work in a PA lab). I wouldn't try this for other bacteria though.

Also, you could try racon instead of nanopolish, but I wouldn't expect clearly better results.

Lastly, I guess this is not submission quality. Maybe just analyze the SVs and gene content instead of the base content ? Or align the reads, not generate a de novo assembly, to detect SNV variants.

Just some ideas

ADD REPLYlink written 20 months ago by colindaven2.3k

Hi all,

I have assembled multiple bacterial genomes sequenced using Oxford Nanopore Minion (FLO-MIN106 flowcell) sequencer.

I have used Pomoxis, Unicycler assemblers to perform the genome assembly. Upon annotating the resultant fasta files of the genome assembly using RAST and PATRIC, I have observed the CDS number to be abnormally hight (Double in some cases) when compared to existing assemblies.

CDS ratio rages from 0.44 to 0.60 (Normal CDS ratio prescribed by NCBI ranges between 0.8 and 1.2).

How can I overcome this issue of abnormal CDS count issue. What is the way forward?

Thanking you all

ADD REPLYlink written 10 months ago by Optimist90
