Question: Is there an over representation for cancer processes in pathway databases (e.g. IPA) - can this be reasonably adjusted?
gravatar for dr.michael.r.barnes
3.5 years ago by
dr.michael.r.barnes40 wrote:

After many years of pathway analysis with a range of public and proprietary tools (IPA, genego, GSEA, KEGG, GO etc), I have developed a suspicion that pathway enrichment statistics and results in general may be biased by the high representation of cancer related genes and pathways in the pathway databases. The question I have is twofold 1) Do others share the view that there is an inherent bias towards cancer in pathway databases? 2) Can (or should) this be corrected for, and if so how?

I've reviewed the literature on this and found nothing, but this is something that I have heard anecdotally on many occasions

thanks in advance for your thoughts

cancer go gsea enrichment pathway • 1.1k views
ADD COMMENTlink modified 3.5 years ago by DG7.1k • written 3.5 years ago by dr.michael.r.barnes40
  1. I completely share this view, though my personal favorites are "Parkinsons related" and "Alzheimers related", which inevitably mean there's a change in metabolic stress.
  2. It depends on the conclusions you're trying to draw. If you're trying to determine if pathway X is more perturbed than pathway Y then this is a big issue (no clue how to correct for this). If you're just trying to find some perturbed pathways so you can do some follow-up experiments then it's probably less of an issue.
ADD REPLYlink written 3.5 years ago by Devon Ryan94k

Thanks Devon, yes I also recognise the points you make! Regarding conclusions it's probably both scenarios, and yes I completely agree with your assessment. I guess the issue that remains it that when to take your example - you report enrichment of PD and AD related genes, many (e.g. reviewers) will take this at face value, but to demonstrate that this represents metabolic stress requires a enrichment in a well annotated metabolic stress pathway to support this statement. Sometimes this may exist, sometimes not. I guess this underlines the importance of a strong gene ontology to compare disease results.

ADD REPLYlink written 3.5 years ago by dr.michael.r.barnes40
gravatar for DG
3.5 years ago by
DG7.1k wrote:

Yeah, it is more accurate to say that disease processes are over-represented. This isn't necessarily a bad thing, if those genes truly function together in pathways then they are still relevant biological pathways. As with most things in biology though, that is just how things are initially discovered. If these pathways are perturbed in your experiments all that you need to do is change your conceptualization, What are these pathways doing in normal healthy cells? What are they doing under different stress conditions? What are they doing in your experiment? The name of the pathway isn't necessarily that important and in fact once you see these pathways are effected you then have a huge amount of literature to draw on to put your results in their biological context.

ADD COMMENTlink written 3.5 years ago by DG7.1k

Thanks Dan, all good points. I think the salient point that emerges here is that disease processes and biological processes both need to be considered and cross compared to determine which biological processes are driving disease and vice versa to a lesser extent.

ADD REPLYlink written 3.5 years ago by dr.michael.r.barnes40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour