Comparing 2 pathway over-representation analyses
Entering edit mode
8.9 years ago

Good day,

I have 2 lists of genes

  1. Gene list A consists of ~500 differentially expressed genes (DEGs) from a microarray analysis
  2. Gene list B is a list of 200 susceptibility genes drawn from various GWAS

I have performed pathway over-representation analysis (ORA) on each list using innateDB and found that there were ~150 and ~115 over-represented pathways involving genes from gene list A and B respectively. Now I wish to compare the 2 lists of over-represented pathways for possible overlap.

Using Venny, I have found that ~50 of them overlap but I understand that I also need to perform a statistical test to ensure that this overlap is not just due to chance. However, I am unsure of what statistical test I should perform and how I can go about doing so.

As part of the analysis of these gene lists, I have previously done a hypergeometric distribution test to check for overlap between the genes themselves, but I do not know if this same test can be applied to pathways and if so how should I go about doing it (i.e. I am unsure of how to define the parameters of the hypergeometric distribution test in this situation). I have also read some studies that use Fisher's method to combine the p-values and was wondering if this is the method I should be using.

Unfortunately I have no experience with R. Regardless, thank you for all your help!

Tl;dr: I have overlapped 2 pathway lists and found that they overlap but do not know what statistical test should be used to give evidence that these results are not due to chance.

Edit: I have also read this thread which relates to my question which states that it a suitable test may be a Fisher's F test. However, I don't understand the procedure I need to follow to get my desired p-value.

Fishers-exact pathway-analysis statistics • 5.0k views
Entering edit mode

Look at this. It computes a hypergeometric distribution (and you don't need to use R). I think you will need the total number of pathways tested from innatedb.

Entering edit mode

Hello, thanks for your reply!

Just to clarify, I have previously used this test to obtain the p-value for the overlap between the genes themselves in this way:

  1. gene list A which is 500 DEGs derived from a platform of 17000 probes
  2. gene list B comprising 200 genes derived from a collection of GWAS

However only 75% - i.e. 150 of these GWAS genes - are represented on the platform and the overlap of genelist A and B is 15 genes, so my hypergeometric distribution test should be looking for:

The probability of getting 15 or more white balls in a sample of size 150 from an urn with 500 white balls and 16500 black balls, assuming H0. Therefore, I used R with an input: 1-phyper(14, 500, 16500, 150) -- I got this formulae via another forum post

However at this moment in time I am actually more interested in at a test for testing the p-vale for overlapping pathways identified via pathway ORA. I was wondering if you were suggesting that the same test could be applied to pathways in the sense that:

  1. gene list A has 150 over-represented pathways derived from a database of x possible pathways
  2. gene list B has 115 over-represented pathways derived from the same database

Since I have identified that there are 50 overlapping pathways, my test would be finding the probability of getting 50 or more 'white balls'' in a sample of size 115 from an urn with 150 white balls and (x-150) black balls, assuming H0?

Would the hypergeometric distribution test be valid/the best test in this situation?

EDIT: Rethinking this problem, would a Fisher's exact test work? I assume the 2x2 table would be:

                        # significant DEG    # non-significant     total
                            pathways            DEG pathways
# of significant               50                115 - 50           115
GWAS pathways
# of non-sig GWAS            150-50               x - 215          x-115
                               150                x - 150            x

where x = total number of pathways possible in the innatedb database

Entering edit mode

Yes, a Fisher's exact test would give you an exact p-value based on your contingency table. There is no difference between your first problem (overlap of input list of genes and genes of a particular pathway) and second problem (overlap of GWAS and DEG pathways).

Using chi-squared test or hypergeometric test is appropriate too but the input table to the software you are using might be different.

Entering edit mode

Yes, I was actually referring to using the same test (hypergeometric) for pathways as well.

No, I would say a Fisher's test is not applicable to your data. I seriously doubt you can create a contingency table like the one you have shown. Technically it should be (but not possible with your data):

Groups (Down) / Outcomes     Significant          Non-Significant         Total
GWAS                                                                      GWAS_Total
DEG                                                                       DEG_Total
                             Total_Significant     Total_NonSignificant

You cannot have one group as column & the other as row (like you have shown).

Entering edit mode

@weeweedelivery123 have you figure out this problem? I have such problem.

I saw mehran.karimzade has confirmed your answer, but komal.rathi has not confirmed.

I appreciate if you reply.


Login before adding your answer.

Traffic: 1088 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6