Comparing pathway names between online databases
Entering edit mode
2.9 years ago
s.may-wilson ▴ 40

So this question is relatively basic but I will be somewhat surprised if there is a relatively simple answer. But here's hoping!

Essentially, I am working on a piece of software which will perform a type of pathway analysis based on online databases. I am, currently, using three different databases for my information on biological pathways: KEGG, Reactome and Wikipathways.

The current iteration of my pipeline will search one of three specified databases for a named pathway and returns all of the genes in that pathway for downstream analysis. My issue here is that it would make for a much easier job (and importantly it would be much easier automate) if there was a way to search for the same pathway in more than just one database at once. Unfortunately each database uses unique pathway IDs and even unique pathway names, e.g. in KEGG: "B cell receptor signaling pathway", in Reactome: "Signaling by the B Cell Receptor (BCR)" and in Wikipathways: "B Cell Receptor Signaling Pathway".

What would be great was if there were some sort of commonly used ID for the B-cell receptor signalling pathway which might be used by all three. Or some kind of resource that specifically compares pathways between each database. Otherwise I am limited to searching for one specific named version of the pathway within the corresponding database.

I guess theoretically though they might also have different criteria for the genes in different pathways and so-on, so comparing them might not even be feasible, but if something like this did exist it would make automating this bit of the software that much simpler!

Reactome Wikipathways KEGG Pathways • 687 views
Entering edit mode

I am not aware of any tool to do this. If there was one, it would need to be frequently updated to keep up with curation in the different resources. The closest I can think of would be the GO cross-references in which external pathways are mapped to GO terms. You'll probably have to deal with the GO structure to relate the pathways. But then if you go down this road, you could just as well use GO directly. Alternatively, you could build some sort of correspondence table by comparing gene overlap between pathways and consider pathways that share more than x% of their genes to be the same. The problem lies in finding a suitable threshold because of the differences in granularity and relationships. A database can put into one pathway what another would split into multiple related pathways. This is also without considering cross-talks. As you surmise, different databases have different focuses and notions about what constitutes a pathway. I usually advise against using a mix and match approach when using reference resources for the reason that each resource corresponds to a particular "view" of biology so mixing references corresponds to mixing these views which can make the outcome harder to interpret. So, when it matters, I'd rather have results presented "as according to KEGG" or "according to Reactome" than as an unclear mash up of databases.

Entering edit mode

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6