Dear community,
we did recover and publish few draft genomes from nanopore sequencing. All were uploaded and annotated via the NCBI PGAP pipeline.
For our last genome the NCBI PGAP reports 30% frameshifted genes. However, prediction and annotation with prodigal and eggNOG did not report a substantial change in neighboring genes with same annotation (spanning the range of the PGAP predicted genes). NCBI concludes that sequencing is not correct, but we have not changed our procedure and coverage is ok (>60x)
Did anyone observe this before?
Any ideas how to fix or any other analysis that might be useful?
Thank you!
I would trust the NCBI PGAP because, on my personal experience, prodigal will still call frame-shifted genes.
I think we need to know the following before we can help
If you haven't put a lot of effort into polishing then yes, maybe 30% of genes are frameshifted, because the dominant error model in ONT is indels.
Thank you for your reply,
We used 9.4.1, more specifically the flongle and bacterial species (Arthrobacter). I did super high accurate basecalling using guppy v6.x.
I used flye (also best to my expericence) and it gave a single closed chromosome. I did not polish with another tool, just trusting the flye polishing.
No illumina reads available so far.
I am wondering the the 6 other genome we generated with the same pipeline were all fine so far (~2-10% psuedogenes from PGAP). I also tryed to use a fraction of the reads, that have better average quality but I did not get a fully closed genome (which I would prefer) and a lower coverage.
Ah, sounds good.
I would definitely polish though, this very minimal polishing pipeline might help ( paths to the programs will need editing, but it gives you an idea how to run long read only polishing).
https://github.com/Colorstorm/assembly_polishing_racon_medaka
I would polish all assemblies before submission. People used to do 2-3 rounds of racon, plus medaka. Then illumina. See if you can improve the base quality.