Question: KEGG pathway gene lists differ between sources
gravatar for aedan.r
7 weeks ago by
University of Technology Sydney, Australia
aedan.r0 wrote:


I'm trying to get lists of genes belonging to KEGG pathways. I had been using MSigDB, but some of the KEGG pathways aren't in MSigDB, so for others I was getting them directly from KEGG.

I've realised that for at least one pathway, the lists of genes differ a lot between KEGG and MSigDB. For example, for apoptosis:

There are 87 genes in the MSigDB list and 136 in the KEGG list, and only 57 gene symbols appear in both lists.

I've also tried the msigdbr and gage R packages, and the lists that they give generally agree with MSigDB, using either gene symbols or Entrez IDs. It seems unlikely to be caused by one or more of the lists being outdated when there are so many differences between them, and anyway, MSigDB was last updated a couple of months ago. It also seems unlikely that three different secondary sources all agree with each other and are all wrong.

So the question is, which, if any, of these sources should I trust? Any suggestions would be appreciated!

kegg msigdb • 143 views
ADD COMMENTlink modified 7 weeks ago by igor11k • written 7 weeks ago by aedan.r0
gravatar for igor
7 weeks ago by
United States
igor11k wrote:

The best source for KEGG pathways should be KEGG itself. Other sources may be outdated or have some restrictions.

MSigDB support forum had a nice explanation:

For the Reactome database, the information you found in the release notes is correct - it is based on Reactome version 44. We are working with them to get an updated version, but I don't have a time frame for when that will be done. For KEGG and Biocarta, we will not be making any further updates. Biocarta is no longer maintained, and they last updated their contents many years ago. I think MSigDB is consistent with their last release. We have not been able to update KEGG since they changed their licensing terms. The MSigDB KEGG collection is based on their last public version (58.0 April 2011).

This was more than 2 years ago and the situation has changed since then, but it gives you an idea of the issues that may lead to inconsistencies.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by igor11k

Thanks. I hadn't come across that site, but that gives a pretty clear resolution.

I thought it was strange that two other sources - MSigDB and the gage package ( - both had nearly the same lists, and both very different from KEGG, but I guess they must both just be similarly out of date. It looks like gage's function sources data from the kegg.db package (, which is no longer updated.

ADD REPLYlink written 7 weeks ago by aedan.r0

You can try KEGGgraph, which downloads directly from KEGG, so you end up with the current version.

ADD REPLYlink written 7 weeks ago by igor11k
gravatar for V
7 weeks ago by
V230 wrote:

I would personally trust KEGG over MsigDB as it tends to be better curated from my experience.

A different solution would be looking directly at GO ( You can look at curated gene sets and filter it on various parameters, including showing only genes that have experimental evidence supporting their listing into that category.

ADD COMMENTlink written 7 weeks ago by V230
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1860 users visited in the last hour