NCBI's and KEGG's genome content information differs even for the very same genome. Why?
0
0
Entering edit mode
21 months ago
bpvalderrama ▴ 30

THE CONTEXT: I have a list of Compounds of interest present in the KEGG database, so using the list of annotated genomes available in KEGG, I produced a list of microorganisms that have at least one KO to catabolize any of those compounds. As I said, all the microorganisms included in my analysis came from the KEGG database, and since they all have a link to the assembled genome in the NCBI as a cross-reference, I downloaded 2 genomes just to double check my results.

THE PROBLEM: The interesting thing is that there are some genes in the annotated genomes available in KEGG that are just not present in the annotated assembly in the NCBI even if the genome itself is showed by KEGG as a cross-reference between both databases. I don't know why this happened. I guess that both databases have different methods to annotate the genomes and that may explain the discrepancies. What do you think?

WHAT I DID: To get the list of genes in all the microorganisms of interest in KEGG I used the API as in the following example. As you can see, using the API, I linked the information of one genome (in this case Lactobacillus plantarum JDM1) with the list of all the KOs in KEGG. Doing that I got the list of KO present in that genome. Regarding the assembled genome in the NCBI, I went to the KEGG entry of the strain and from there I went to the NCBI website clicking the link that says Assembly. Then I downloaded the Assembly and noticed that there are KO present in the list provided by the KEGG API that are not present in the list of genes (converted to KO) in the NCBI

Annotation Gene KO KEGG NCBI • 413 views
ADD COMMENT
0
Entering edit mode

I guess that both databases have different methods to annotate the genomes and that may explain the discrepancies.

Genome sequence should be identical. But each annotation provider may follow a different path for annotations. NCBI uses https://www.ncbi.nlm.nih.gov/genome/annotation_prok/

Can you provide an example of a KO and its associated gene name. At least gene is hopefully present in both annotations.

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6