Question

Pathway Over-Representation Across Multiple Gene Lists

8

Entering edit mode

12.2 years ago

Obi Griffith 20k

I am looking for a method of finding over-represented pathways across multiple lists of differentially expressed genes. The simple scenario is that I have RNA-seq expression data for cell lines with derived resistance to a drug. For each resistance model (cell line) I can compare between the resistant line and the parental to get a list of differentially expressed genes (DEG) in resistant vs parental. And, I can identify over-represented pathways in that list of genes (e.g., using DAVID).

Now, consider that I have six different cell lines with derived resistance and each has its own list of differentially expressed genes (when compared to each parental). I can perform pathway analysis on each DEG list and then see which pathways come up as significant in multiple lists.

My question is, how do I get at those pathways which are maybe not significantly enriched in any one list but which are consistently represented by one or more genes in all (or many) of the different lists. I thought I would start by just mapping each gene in each DEG list to all its associated pathways (e.g., using KEGG) and then look for pathways that come up in all models. But, how do I define which are significant? I think I would want a test that considers both the amount of pathway over-representation within each list and across lists. Has anyone seen a method for that?

Sample data (completely fictional): How do I identify pathways (e.g., DNARepair) which are enriched across multiple lists (but not necessarily within lists)?

GeneListA
TP53 DNARepair
BIRC5 Apoptosis
MYC Transcription factor
...

GeneListB
EZH2 Chromatin
BRCA1 DNARepair
...

GeneListC
RB1 DNARepair
etc...

pathway pathway kegg meta • 6.9k views

ADD COMMENT • link updated 7.7 years ago by EagleEye 7.5k • written 12.2 years ago by Obi Griffith 20k

score 2 · Answer 1 · 2012-02-14

There are a couple of techniques for approaching this.

1) Demonstrate enrichment of specific pathways in your dataset using Fisher's F test. After you have annotated your differentially expressed genes with their pathway membership(s), take the gene list and ask how many of these genes are in each pathway? Compare this number with the number of genes (or the variance) in a randomly generated genelist (in other words, a control dataset) that are present in the same pathway. A value of p<0.05 will indicate a significant enrichment for a given pathway among your differentially expressed genes.

2) WGCNA is an R package that may be able to help you as well, if you are interested in demonstrating which groups of genes are grouped together.

I would try both, and I'm sure others will recommend additional approaches. Good luck!

score 2 · Answer 2 · 2012-02-14

2

Entering edit mode

12.2 years ago

Giovanni M Dall'Olio 28k

The Reactome website has a few tools to analyze expression data.

The Pathway Analysis tool "Allows you to analyse a list of protein, gene, expression data or compound identifiers and determine how they are likely to affect pathways. "

Or, the Expression Data tool "Takes gene expression data (and also numerical proteomics data) and shows how expression levels affect reactions and pathways in living organisms. "

ADD COMMENT • link 12.2 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Thanks Giovanni. These tools (e.g., the Pathway Analysis too) seem to identify over-representation of pathways within a single gene list. There are many such methods. My question is a little different. I want to find over-representation of pathways across multiple gene lists where those pathways are not necessarily significantly enriched within any individual gene list.

ADD REPLY • link 9.0 years ago by Obi Griffith 20k