Is KEGG still the reigning champion? Any comparisions of MetaCyc? It looks fairly complete for Ecoli at least.
Is KEGG still the reigning champion? Any comparisions of MetaCyc? It looks fairly complete for Ecoli at least.
I have been doing some annotation work for Reactome. Compared with KEGG, Reactome stores a lot of information more:
The problem with KEGG is, in my opinion, that they do not put neither references nor description for the reactions. A certain pathway may be annotated in a way and if there is something unclear about a reaction, there is no way to find the references that have been used to justify it, and it is difficult to contact the original authors of the pathway. The entries in KEGG are 'authored by experts in the field' but it is impossible to know who these experts are, neither why they made certain choices.
Moreover, KEGG has a somewhat artificial distinction between metabolic and protein-protein interaction pathways; in Reactome, you can use the GO ids to distinguish the types of reactions, without splitting the reactome artificially.
Finally, there are other databases for pathways:
The UniProt pathways are maintained via http://www.grenoble.prabi.fr/obiwarehouse/unipathway
The creators of MetaCyc have compared it with other pathway databases:
I don't know about independent comparisons, but searching PubMed for "metabolic pathway databases" throws up about 80 review articles.
In general, it's difficult to compare different resources objectively. They often have slightly different aims, design philosophies and of course, data access tools and formats. Most people choose one based on "look and feel" and how well it suits their particular project needs.
I am in total agreement with Giovanni, KEGG is definitely loosing their edge, for several reason,
As matter of fact none of pathways are as comprehensive as they claim, see following reports
Pathway analysis software: Annotation errors and solutions
Consistency, comprehensiveness, and compatibility of pathway databases.
EcoCyc (and Metacyc) seems to have the philosophy of 'know as much about a pathway as possible' where as KEGG seems to have a 'know as many pathways as possible' approach. Typically I would always look in EcoCyc first as my gold standard before going to another database: but perhaps I am biased as I used to work for a group that curated EcoCyc.
I guess it just depends the question you are asking, the species and the area you are working on as well as your definition of 'pathway' (e.g. is a protein-protein interaction a part of a pathway?)
For example biogrid is excellent for some of the pathways that they focus on, especially if you want to know any possible connections and are willing to look at the evidence for each part of the pathway/interaction. e.g. they have put a lot of work into arabidopsis and are currently working on the ubiquitin 'pathway'.
Also as a previous post mentioned, reactome is worth looking at: a new version has just been released.
I compared five pathway databases that describe the human metabolic network and the differences are quite large. For example, only 510 of the 3858 genes they have combined could be found in all five databases. For further detail see: http://www.biomedcentral.com/1752-0509/5/165.
There is no easy answer to which one is "best", as this really depends also on the analyses you are using it for.
Pathway databases are curated by companies or different research groups and these resource have lot inconsistencies. I have noticed that the different databases have different number of genes in same pathway. Different databases use modified/specific pathway names etc. So a direct comparison of pathway database will be difficult.
I would recommend you to be motivated by what is your biological question, select appropriate data resources or take union or intersection of different resources. While selecting resources do consider the curation strategy and experimental method used to associate the genes with pathway.
Inaddition to the resources listed here: I would recommend you to take a look at WikiPathways and Pathway Commons, which provide a unified resource to access different pathway databases.
Look to see what the top research groups in your field are using - check their publications or ask them directly by email/telephone/at conference. In this manner, you will be using the "accepted" database. If you still consider another pathway database superior, then use both and offer a comparison - doing such will add to the conversation about comparing the two.
I strongly disagree with simply going along with what top labs are using. Just because a reputed lab is using some database does not mean one should blindly use. Although seeing what top labs prefer may be a quick way to finding a potentially good source, as scientists, we need to be skeptical of what each database actually represents, how they get their data, how up-to-date it is, and how their information is validated, and, in this case, how they define biological pathways. It is known that this is not uniform across databases, leading to the presentation of bias.
If you are looking for cancer and immune pathways you might want to check http://netpath.org. They pathways they have are quite comprenhensive and can be downloaded in BioPax format.
I work for ProteinLounge that has a commercial database of biological pathways(http://www.proteinlounge.com/pathways/). The graphics make it easier to visualize than other databases like say KEGG. As others have said, each pathway database has their own look and feel. However, if someone wants a database that is more visually eye catching than the ProteinLounge pathway database is definitely one that a user should look into.
The BioCyc team has created a more detailed comparison of BioCyc and KEGG covering both their data content and their informatics tools.
The comparison is here:
https://bioinformatics.ai.sri.com/biocyc/kegg-biocyc-comparison.pdf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I tried using SEED for a pipeline and got frustrated with inconsistencies in different database files: subsystems, subsystems2role, etc.
Thanks for the input - i am being lazy as there are so many DBs and so many references. Feel like I can start with these answers. Ideally some combination of accuracy and coverage is best. After 10 years working in bioinformatics, I find the 'subjective' tag applies in way more cases than I had expected.