Question: Comparing 2 pathway over-representation analyses
gravatar for weeweedelivery123
5.6 years ago by
United Kingdom
weeweedelivery12340 wrote:

Good day, 

I have 2 lists genes

1) Gene list A consists of ~500 differentially expressed genes (DEGs) from a microarray analysis 

2) Gene list B is a list of 200 susceptibility genes drawn from various GWAS

I have performed pathway over-representation analysis (ORA) on each list using innateDB and found that there were ~150 and ~115 over-represented pathways involving genes from gene list A and B respectively. Now I wish to compare the 2 lists of over-represented pathways for possible overlap.

Using Venny, I have found that ~50 of them overlap but I understand that I also need to perform a statistical test to ensure that this overlap is not just due to chance. However, I am unsure of what statistical test I should perform and how I can go about doing so.

As part of the analysis of these gene lists, I have previously done a hypergeometric distribution test to check for overlap between the genes themselves, but I do not know if this same test can be applied to pathways and if so how should I go about doing it (i.e. I am unsure of how to define the parameters of the hypergeometric distribution test in this situation). I have also read some studies that use Fisher's method to combine the p-values and was wondering if this is the method I should be using.

Unfortunately I have no experience with R. Regardless, thank you for all your help!


Tl;dr: I have overlapped 2 pathway lists and found that they overlap but do not know what statistical test should be used to give evidence that these results are not due to chance.


Edit: I have also read this tread (Pathway Over-Representation Across Multiple Gene Lists) which relates to my question which states that it a suitable test may be a Fisher's F test. However, I don't understand the procedure I need to follow to get my desired p-value. 

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by weeweedelivery12340

Look at this. It computes a hypergeometric distribution (and you don't need to use R). I think you will need the total number of pathways tested from innatedb. 

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by komal.rathi3.7k

Hello, thanks for your reply! 

Just to clarify, I have previously used this test to obtain the p-value for the overlap between the genes themselves in this way:

1) gene list A which is 500 DEGs derived from a platform of 17000 probes
2) gene list B comprising 200 genes derived from a collection of GWAS
However only 75% - i.e. 150 of these GWAS genes - are represented on the platform and the overlap of genelist A and B is 15 genes, so my hypergeometric distribution test should be looking for:

The probability of getting 15 or more white balls in a sample of size 150 from an urn with 500 white balls and 16500 black balls, assuming H0.
Therefore, I used R with an input: 1-phyper(14, 500, 16500, 150) -- I got this formulae via another forum post

However at this moment in time I am actually more interested in at a test for testing the p-vale for overlapping pathways identified via pathway ORA. I was wondering if you were suggesting that the same test could be applied to pathways in the sense that:
1) gene list A has 150 over-represented pathways derived from a database of x possible pathways 
2) gene list B has 115 over-represented pathways derived from the same database

Since I have identified that there are 50 overlapping pathways, my test would be finding the probability of getting 50 or more ''white balls'' in a sample of size 115 from an urn with 150 white balls and (x-150) black balls, assuming H0?

Would the hypergeometric distribution test be valid/the best test in this situation?

EDIT: Rethinking this problem, would a Fisher's exact test work? I assume the 2x2 table would be:

  # significant DEG pathways # non-significant DEG pathways total 
# of significant GWAS pathways                   50                 115 - 50  115   
# of non-sig GWAS pathways              150 -50                  x - 215  x-115    
                   150                  x - 150     x

where x = total number of pathways possible in the innatedb database



ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by weeweedelivery12340

Yes, a Fisher's exact test would give you an exact p-value based on your contingency table. There is no difference between your first problem (overlap of input list of genes and genes of a particular pathway) and second problem (overlap of GWAS and DEG pathways).

Using chi-squared test or hypergeometric test is appropriate too but the input table to the software you are using might be different.

ADD REPLYlink written 5.6 years ago by mehran.karimzade180

Yes, I was actually referring to using the same test (hypergeometric) for pathways as well.

No, I would say a Fisher's test is not applicable to your data. I seriously doubt you can create a contingency table like the one you have shown. Technically it should be (but not possible with your data):


Groups (Down) / Outcomes (Right) Significant Non-Significant Total
GWAS     GWAS_Total
DEG     DEG_Total
  Total_Significant Total_NonSignificant  


You cannot have one group as column & the other as row (like you have shown). 

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by komal.rathi3.7k

@weeweedelivery123 have you figure out this problem? I have such problem.

I saw mehran.karimzade has confirmed your answer, but komal.rathi has not confirmed.

I appreciate if you reply.

ADD REPLYlink written 5.5 years ago by Na Sed310
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour