KEGG pathway gene lists differ between sources
2
0
Entering edit mode
3.9 years ago
aedan.r • 0

Hi,

I'm trying to get lists of genes belonging to KEGG pathways. I had been using MSigDB, but some of the KEGG pathways aren't in MSigDB, so for others I was getting them directly from KEGG.

I've realised that for at least one pathway, the lists of genes differ a lot between KEGG and MSigDB. For example, for apoptosis:

https://www.gsea-msigdb.org/gsea/msigdb/cards/KEGG_APOPTOSIS.html

https://www.genome.jp/dbget-bin/www_bget?pathway+hsa04210

There are 87 genes in the MSigDB list and 136 in the KEGG list, and only 57 gene symbols appear in both lists.

I've also tried the msigdbr and gage R packages, and the lists that they give generally agree with MSigDB, using either gene symbols or Entrez IDs. It seems unlikely to be caused by one or more of the lists being outdated when there are so many differences between them, and anyway, MSigDB was last updated a couple of months ago. It also seems unlikely that three different secondary sources all agree with each other and are all wrong.

So the question is, which, if any, of these sources should I trust? Any suggestions would be appreciated!

KEGG MSigDB • 1.5k views
ADD COMMENT
2
Entering edit mode
3.9 years ago
igor 13k

The best source for KEGG pathways should be KEGG itself. Other sources may be outdated or have some restrictions.

MSigDB support forum had a nice explanation:

For the Reactome database, the information you found in the release notes is correct - it is based on Reactome version 44. We are working with them to get an updated version, but I don't have a time frame for when that will be done. For KEGG and Biocarta, we will not be making any further updates. Biocarta is no longer maintained, and they last updated their contents many years ago. I think MSigDB is consistent with their last release. We have not been able to update KEGG since they changed their licensing terms. The MSigDB KEGG collection is based on their last public version (58.0 April 2011).

This was more than 2 years ago and the situation has changed since then, but it gives you an idea of the issues that may lead to inconsistencies.

ADD COMMENT
0
Entering edit mode

Thanks. I hadn't come across that site, but that gives a pretty clear resolution.

I thought it was strange that two other sources - MSigDB and the gage package (https://bioconductor.org/packages/release/bioc/html/gage.html) - both had nearly the same lists, and both very different from KEGG, but I guess they must both just be similarly out of date. It looks like gage's kegg.gs function sources data from the kegg.db package (https://bioconductor.org/packages/release/data/annotation/html/KEGG.db.html), which is no longer updated.

ADD REPLY
1
Entering edit mode

You can try KEGGgraph, which downloads directly from KEGG, so you end up with the current version.

ADD REPLY
2
Entering edit mode
3.9 years ago
V ▴ 380

I would personally trust KEGG over MsigDB as it tends to be better curated from my experience.

A different solution would be looking directly at GO (http://geneontology.org/) You can look at curated gene sets and filter it on various parameters, including showing only genes that have experimental evidence supporting their listing into that category.

ADD COMMENT

Login before adding your answer.

Traffic: 3093 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6