Question

What is the difference between eggNOG, COG and KEGG?

2

Entering edit mode

6.4 years ago

anderson.nb6 ▴ 30

I can not understand the difference between these databases. Also, when should I use each?

Thanks ...

sequence next-gen genome database • 14k views

ADD COMMENT • link 6.4 years ago by anderson.nb6 ▴ 30

score 3 · Answer 1 · 2017-11-28

I'll try to explain it in a few words right here. See the details below.

KEGG is independent "Encyclopedia of Genes and Genomes".See the link below about its content.

EggNOG_ is based on COG. EggNOG is Non-supervised, this is important.

COG: "Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally." COG = Clusters of Orthologous Genes. See the abstract and a link to a detailed articled at the bottom.

There are more details about all three databases below.

https://en.wikipedia.org/wiki/KEGG

"KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development."

https://en.wikipedia.org/wiki/EggNOG_(database)

"The eggNOG database is a database of biological information hosted by the EMBL. It is based on the original idea of COGs (clusters of orthologous groups)[2][3] and expands that idea to non-supervised orthologous groups constructed from numerous organisms.[4] The database was created in 2007[5] and updated to version 4.5 in 2015.[1] eggNOG stands for evolutionary genealogy of genes: Non-supervised Orthologous Groups."

An article about COG: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383993/

“Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.”