Issue wiht pseudogenes in bacteria genome
2.0 years ago
hjafar ▴ 10

I have submitted bacteria genome to NCBI and I have received a pseudogenes issue in bacteria genome as shown following :

Before we can assign your accession number, there are a few issues that require your attention.

We have annotated your genome and found that the number of pseudogenes is greater than 10% of the called gene features. This suggests that there may be a problem with your sequence and should be investigated. Please check it to determine whether you need to submit a new assembly. Let us know if you would like to proceed with the existing information.

Note that the vast majority of the pseudogenes are due to frameshifts, which suggests that there are insertions and deletions in your sequence that are causing the excess pseudogenes. You should take this into account when looking into any problems with the genome.

Your genome has 775 pseudogenes out of 5806 CDS genes. 386 of these are frameshifts. 441 are incomplete.

We have uploaded a .sqn file called 'Current.sqn' to your portal submissions so you can see the annotation. We also set the FIX button(s) so that you can upload revised assemblies if necessary.

How can I solve this problem ? Are there any ways should I follow to solve it ?

Thanks in advance,

genome
Is this a PacBio or MinION assembly? Frameshifts in the assembly can also lead to fake pseudogenes, for which the solution is to run Illumina-based error correction (see Mick Watson's blog: )

See @h.mon's answer in this thread: Pseudogenes in bacteria genome


