I am looking for a method of finding over-represented pathways across multiple lists of differentially expressed genes. The simple scenario is that I have RNA-seq expression data for cell lines with derived resistance to a drug. For each resistance model (cell line) I can compare between the resistant line and the parental to get a list of differentially expressed genes (DEG) in resistant vs parental. And, I can identify over-represented pathways in that list of genes (e.g., using DAVID).
Now, consider that I have six different cell lines with derived resistance and each has its own list of differentially expressed genes (when compared to each parental). I can perform pathway analysis on each DEG list and then see which pathways come up as significant in multiple lists.
My question is, how do I get at those pathways which are maybe not significantly enriched in any one list but which are consistently represented by one or more genes in all (or many) of the different lists. I thought I would start by just mapping each gene in each DEG list to all its associated pathways (e.g., using KEGG) and then look for pathways that come up in all models. But, how do I define which are significant? I think I would want a test that considers both the amount of pathway over-representation within each list and across lists. Has anyone seen a method for that?
Sample data (completely fictional): How do I identify pathways (e.g., DNARepair) which are enriched across multiple lists (but not necessarily within lists)?
GeneListA
TP53 DNARepair
BIRC5 Apoptosis
MYC Transcription factor
...
GeneListB
EZH2 Chromatin
BRCA1 DNARepair
...
GeneListC
RB1 DNARepair
etc...