I have been doing some annotation work for Reactome. Compared with KEGG, Reactome stores a lot of information more:
- the GeneOntology Ids for the Metabolic Process of each reaction, plus the GO for the localization of each component of a reaction
- a text to describe what is happening in each reaction, plus a reference
- the names of the author of each annotated pathway
- the SBML and BioPax version of the pathway
The problem with KEGG is, in my opinion, that they do not put neither references nor description for the reactions. A certain pathway may be annotated in a way and if there is something unclear about a reaction, there is no way to find the references that have been used to justify it, and it is difficult to contact the original authors of the pathway. The entries in KEGG are 'authored by experts in the field' but it is impossible to know who these experts are, neither why they made certain choices.
Moreover, KEGG has a somewhat artificial distinction between metabolic and protein-protein interaction pathways; in Reactome, you can use the GO ids to distinguish the types of reactions, without splitting the reactome artificially.
Finally, there are other databases for pathways:
- The Nature Pathways database
- Uniprot pathways is nice but in my opinion has a very ad web interface, which makes it difficult to access to its contents.
- BioCarta which has nice charts and figures
- SignaLink about signalling pathways. I can't tell you much about it since I do not know much about signalling pathways, but they have a manual curation process.
- The Edinburgh Metabolic Pathways database - you have to register to use it.
The creators of MetaCyc have compared it with other pathway databases:
- The MetaCyc database paper includes section Comparison of MetaCyc and KEGG
- The MetaCyc user guide also has a section "Comparison of MetaCyc to other Pathway Databases"
I don't know about independent comparisons, but searching PubMed for "metabolic pathway databases" throws up about 80 review articles.
In general, it's difficult to compare different resources objectively. They often have slightly different aims, design philosophies and of course, data access tools and formats. Most people choose one based on "look and feel" and how well it suits their particular project needs.
I am in total agreement with Giovanni, KEGG is definitely loosing their edge, for several reason,
- Strict licensing policy of KEGG. Contrary to that Reactome is released under CC so it gives you a lot freedom especially if you are a commercial user.
- People use KEGG due to their maps but it seems their approach lacks interoperability with other tools and formats. In terms of interoperability, Reactome is ahead of others, they have support for BioPAX, SBML and latest release includes SBGN.
As matter of fact none of pathways are as comprehensive as they claim, see following reports
EcoCyc (and Metacyc) seems to have the philosophy of 'know as much about a pathway as possible' where as KEGG seems to have a 'know as many pathways as possible' approach. Typically I would always look in EcoCyc first as my gold standard before going to another database: but perhaps I am biased as I used to work for a group that curated EcoCyc.
I guess it just depends the question you are asking, the species and the area you are working on as well as your definition of 'pathway' (e.g. is a protein-protein interaction a part of a pathway?)
For example biogrid is excellent for some of the pathways that they focus on, especially if you want to know any possible connections and are willing to look at the evidence for each part of the pathway/interaction. e.g. they have put a lot of work into arabidopsis and are currently working on the ubiquitin 'pathway'.
Also as a previous post mentioned, reactome is worth looking at: a new version has just been released.
I compared five pathway databases that describe the human metabolic network and the differences are quite large. For example, only 510 of the 3858 genes they have combined could be found in all five databases. For further detail see: http://www.biomedcentral.com/1752-0509/5/165.
There is no easy answer to which one is "best", as this really depends also on the analyses you are using it for.
Pathway databases are curated by companies or different research groups and these resource have lot inconsistencies. I have noticed that the different databases have different number of genes in same pathway. Different databases use modified/specific pathway names etc. So a direct comparison of pathway database will be difficult.
I would recommend you to be motivated by what is your biological question, select appropriate data resources or take union or intersection of different resources. While selecting resources do consider the curation strategy and experimental method used to associate the genes with pathway.
Look to see what the top research groups in your field are using - check their publications or ask them directly by email/telephone/at conference. In this manner, you will be using the "accepted" database. If you still consider another pathway database superior, then use both and offer a comparison - doing such will add to the conversation about comparing the two.
I work for ProteinLounge that has a commercial database of biological pathways(http://www.proteinlounge.com/pathways/). The graphics make it easier to visualize than other databases like say KEGG. As others have said, each pathway database has their own look and feel. However, if someone wants a database that is more visually eye catching than the ProteinLounge pathway database is definitely one that a user should look into.