Question: High CDS Count in my assembled Genome using Nanopore reads (ONT) data
12 months ago by
Optimist90 wrote:

Hi all,

I have assembled multiple bacterial genomes sequenced using Oxford Nanopore Minion (FLO-MIN106 flowcell) sequencer.

I have used Pomoxis, Unicycler assemblers to perform the genome assembly. Upon annotating the resultant fasta files of the genome assembly using RAST and PATRIC, I have observed the CDS number to be abnormally hight (Double in some cases) when compared to existing assemblies.

CDS ratio rages from 0.44 to 0.60 (Normal CDS ratio prescribed by NCBI ranges between 0.8 and 1.2).

How can I overcome this issue of abnormal CDS count issue. What is the way forward?

Thanking you all

assembly nanopore high cds wgs • 375 views
ADD COMMENTlink modified 11 months ago by h.mon30k • written 12 months ago by Optimist90
11 months ago by
h.mon30k wrote:

As this is a Nanopore-only assembly, there are many errors (mainly indels) which negatively affect gene prediction:

Nanopore only assembly errors

Mind the gaps – ignoring errors in long read assemblies critically affects protein prediction

ADD COMMENTlink written 11 months ago by h.mon30k

If your consensus accuracy is 99.9% then you still have 1 errors every 1000 bp. A typical bacterial gene is ~ 1000bp long. That 1 error is usually an indel. This results in a frame-shift in your CDS. If you use a gene finder like Prodgial (used in prokka) then you will get ~2 predicted CDS for every real CDS. You need to also sequence it with Illumina and polish the nanopore assembly.

ADD REPLYlink written 11 months ago by Torst960

One note about this : there's probably already Illumina data out there for your strains of interest. Check this rather nice program to locate and download SRA or ENA data more quickly:

ADD REPLYlink written 11 months ago by colindaven2.3k

There is no Illumina data available for the isolates under study.

ADD REPLYlink written 11 months ago by Optimist90
