Nanopore sequencing bacteria and problems with NCBI PGAP annotation
0
0
Entering edit mode
17 months ago
GGG_Alex ▴ 20

Dear community,

we did recover and publish few draft genomes from nanopore sequencing. All were uploaded and annotated via the NCBI PGAP pipeline.

For our last genome the NCBI PGAP reports 30% frameshifted genes. However, prediction and annotation with prodigal and eggNOG did not report a substantial change in neighboring genes with same annotation (spanning the range of the PGAP predicted genes). NCBI concludes that sequencing is not correct, but we have not changed our procedure and coverage is ok (>60x)

Did anyone observe this before?

Any ideas how to fix or any other analysis that might be useful?

Thank you!

annotation ncbi PGAP nanopore gene • 797 views
ADD COMMENT
1
Entering edit mode

I would trust the NCBI PGAP because, on my personal experience, prodigal will still call frame-shifted genes.

ADD REPLY
1
Entering edit mode

I think we need to know the following before we can help

  • what nanopore flowcell or kit was used, 9.4.1 or 10.4 ?
  • what assembler was used ? (flye is good)
  • which long read polishing pipeline was used ? (medaka, racon?)
  • are illumina reads available for short read polishing ? (hypo, pilon etc)

If you haven't put a lot of effort into polishing then yes, maybe 30% of genes are frameshifted, because the dominant error model in ONT is indels.

ADD REPLY
0
Entering edit mode

Thank you for your reply,

We used 9.4.1, more specifically the flongle and bacterial species (Arthrobacter). I did super high accurate basecalling using guppy v6.x.

I used flye (also best to my expericence) and it gave a single closed chromosome. I did not polish with another tool, just trusting the flye polishing.

No illumina reads available so far.

I am wondering the the 6 other genome we generated with the same pipeline were all fine so far (~2-10% psuedogenes from PGAP). I also tryed to use a fraction of the reads, that have better average quality but I did not get a fully closed genome (which I would prefer) and a lower coverage.

ADD REPLY
1
Entering edit mode

Ah, sounds good.

I would definitely polish though, this very minimal polishing pipeline might help ( paths to the programs will need editing, but it gives you an idea how to run long read only polishing).

https://github.com/Colorstorm/assembly_polishing_racon_medaka

I would polish all assemblies before submission. People used to do 2-3 rounds of racon, plus medaka. Then illumina. See if you can improve the base quality.

ADD REPLY

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6