Question

Improvement of genome assembly using illumina contigs and nanopore reads

0

Entering edit mode

4.8 years ago

KG ▴ 10

Hi, I have nanopore reads and a very fragmented genome assembly (~500 contigs for 16-20 mb genome) but not the illumina reads. I have used canu and generated a de novo assembly (44 contigs) from the nanopore reads (~30x). Since I do not have illumina reads, I could not polish this de novo assembly. Therefore, many of the ORFs could not be annotated (due to base pair level errors). I was wondering if there is any way to use the contigs (assembled from illumina reads) and improve the assembly quality (rectify base pair level errors). I have also tried LINKS and SMIS and could improve the assembly from ~500 contigs to ~200 contigs but we need a better assembly for our downstream analysis. I would appreciate if anybody can suggest any way out. We might get some illumina sequence reads in a month or so, but I wanted to know if anything can be done with what we have now.

Thanks!

genome assembly nanopore illumina scaffolding • 2.3k views

ADD COMMENT • link updated 4.8 years ago by h.mon 35k • written 4.8 years ago by KG ▴ 10

score 0 · Answer 1 · 2019-06-24

0

Entering edit mode

4.8 years ago

h.mon 35k

You will get best polishing results with Illumina (or Illumina + Nanopore), but you can get a good improvement with Nanopore polishing. Try Racon, Nanopolish, or the polisher available for the wtdbg2 assembler - there are other polishers, but I never used them.

You can also try assembling with Flye, it has a built-in polishing step.

How come you have an assembly made with Illumina reads, but you don't have Illumina data?

ADD COMMENT • link 4.8 years ago by h.mon 35k

0

Entering edit mode

Thanks for your suggestions. I'll give it a try and use nanopore polishing tools.

We have not generated that illumina assembly. It's available from NCBI, but not the raw reads.

ADD REPLY • link 4.8 years ago by KG ▴ 10

0

Entering edit mode

I have not yet tried polishing a short read assembly with long reads (and i would assume one shouldn't if they have other options).

My first suggestion would be to try contacting the author of the paper and ask them for the illumina reads.

If you really have to work with the short read assembly + nanopore reads, then i guess your goal is not improving the quality of existing sequences, but rather linking contigs / resolving repeats. I would not expect racon or nanopolish to be of much use here. But you might try, of course.

From a cursory search: Long Read Gapcloser and GMcloser seem to be built specifically for your task.

ADD REPLY • link 4.8 years ago by Tom ▴ 540