Question: Improvement of genome assembly using illumina contigs and nanopore reads
gravatar for KG
16 months ago by
KG10 wrote:

Hi, I have nanopore reads and a very fragmented genome assembly (~500 contigs for 16-20 mb genome) but not the illumina reads. I have used canu and generated a de novo assembly (44 contigs) from the nanopore reads (~30x). Since I do not have illumina reads, I could not polish this de novo assembly. Therefore, many of the ORFs could not be annotated (due to base pair level errors). I was wondering if there is any way to use the contigs (assembled from illumina reads) and improve the assembly quality (rectify base pair level errors). I have also tried LINKS and SMIS and could improve the assembly from ~500 contigs to ~200 contigs but we need a better assembly for our downstream analysis. I would appreciate if anybody can suggest any way out. We might get some illumina sequence reads in a month or so, but I wanted to know if anything can be done with what we have now.


ADD COMMENTlink modified 16 months ago by h.mon31k • written 16 months ago by KG10
gravatar for h.mon
16 months ago by
h.mon31k wrote:

You will get best polishing results with Illumina (or Illumina + Nanopore), but you can get a good improvement with Nanopore polishing. Try Racon, Nanopolish, or the polisher available for the wtdbg2 assembler - there are other polishers, but I never used them.

You can also try assembling with Flye, it has a built-in polishing step.

How come you have an assembly made with Illumina reads, but you don't have Illumina data?

ADD COMMENTlink written 16 months ago by h.mon31k

Thanks for your suggestions. I'll give it a try and use nanopore polishing tools.

We have not generated that illumina assembly. It's available from NCBI, but not the raw reads.

ADD REPLYlink written 16 months ago by KG10

I have not yet tried polishing a short read assembly with long reads (and i would assume one shouldn't if they have other options).

My first suggestion would be to try contacting the author of the paper and ask them for the illumina reads.

If you really have to work with the short read assembly + nanopore reads, then i guess your goal is not improving the quality of existing sequences, but rather linking contigs / resolving repeats. I would not expect racon or nanopolish to be of much use here. But you might try, of course.

From a cursory search: Long Read Gapcloser and GMcloser seem to be built specifically for your task.

ADD REPLYlink modified 16 months ago • written 16 months ago by Tom530
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1204 users visited in the last hour