Question: Pathway Over-Representation Across Multiple Gene Lists
gravatar for Obi Griffith
7.1 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

I am looking for a method of finding over-represented pathways across multiple lists of differentially expressed genes. The simple scenario is that I have RNA-seq expression data for cell lines with derived resistance to a drug. For each resistance model (cell line) I can compare between the resistant line and the parental to get a list of differentially expressed genes (DEG) in resistant vs parental. And, I can identify over-represented pathways in that list of genes (e.g., using DAVID).

Now, consider that I have six different cell lines with derived resistance and each has its own list of differentially expressed genes (when compared to each parental). I can perform pathway analysis on each DEG list and then see which pathways come up as significant in multiple lists.

My question is, how do I get at those pathways which are maybe not significantly enriched in any one list but which are consistently represented by one or more genes in all (or many) of the different lists. I thought I would start by just mapping each gene in each DEG list to all its associated pathways (e.g., using KEGG) and then look for pathways that come up in all models. But, how do I define which are significant? I think I would want a test that considers both the amount of pathway over-representation within each list and across lists. Has anyone seen a method for that?

Sample data (completely fictional): How do I identify pathways (e.g., DNARepair) which are enriched across multiple lists (but not necessarily within lists)?

TP53 DNARepair
BIRC5 Apoptosis
MYC Transcription factor

EZH2 Chromatin

RB1 DNARepair

pathway meta kegg • 5.2k views
ADD COMMENTlink modified 2.7 years ago by EagleEye6.2k • written 7.1 years ago by Obi Griffith17k
gravatar for Alex Paciorkowski
7.1 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

There are a couple of techniques for approaching this.

1) Demonstrate enrichment of specific pathways in your dataset using Fisher's F test. After you have annotated your differentially expressed genes with their pathway membership(s), take the gene list and ask how many of these genes are in each pathway? Compare this number with the number of genes (or the variance) in a randomly generated genelist (in other words, a control dataset) that are present in the same pathway. A value of p<0.05 will indicate a significant enrichment for a given pathway among your differentially expressed genes.

2) WGCNA is an R package that may be able to help you as well, if you are interested in demonstrating which groups of genes are grouped together.

I would try both, and I'm sure others will recommend additional approaches. Good luck!

ADD COMMENTlink written 7.1 years ago by Alex Paciorkowski3.3k

Thanks Alex. It seems like WGCNA is very much on topic but does not do exactly what I am looking for. The random permutation test is a good suggestion and may be the way to go. Given 3 lists of genes and their corresponding pathway annotations I can define some measure of over-representation of any pathway both within and across the 3 lists. Then with randomly generated gene lists (from the same background list) of same sizes, I can see how often that measure of over-representation is observed by chance.

ADD REPLYlink written 7.1 years ago by Obi Griffith17k
gravatar for Giovanni M Dall'Olio
7.1 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

The Reactome website has a few tools to analyze expression data.

The Pathway Analysis tool "Allows you to analyse a list of protein, gene, expression data or compound identifiers and determine how they are likely to affect pathways. "

Or, the Expression Data tool "Takes gene expression data (and also numerical proteomics data) and shows how expression levels affect reactions and pathways in living organisms. "

ADD COMMENTlink written 7.1 years ago by Giovanni M Dall'Olio26k

Thanks Giovanni. These tools (e.g., the Pathway Analysis too) seem to identify over-representation of pathways within a single gene list. There are many such methods. My question is a little different. I want to find over-representation of pathways across multiple gene lists where those pathways are not necessarily significantly enriched within any individual gene list.

ADD REPLYlink modified 4.0 years ago • written 7.1 years ago by Obi Griffith17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2460 users visited in the last hour