Question: Comparing 2 gene lists
gravatar for weeweedelivery123
5.6 years ago by
United Kingdom
weeweedelivery12340 wrote:

Good day, 

I have 2 lists of genes that I wish to compare in terms of 

1) overlap between the genes themselves

2) overlap between the their respective pathways after ORA using innateDB

Gene list A consists of approximately 500 differentially expressed genes (DEGs) from a microarray analysis with over 17000 probes and Gene list B is a list of 200 susceptibility genes drawn from various GWAS. I have converted all genes in both lists to their Ensembl ID so that I can compare like-with-like.

Firstly, to find the overlap between the genes themselves, I have then used Venny to identify that 75% of the GWAS genes are represented in the microarray platform. Venny has also showed me that 15 genes overlap between the DEG and GWAS gene lists but I understand that I also need to perform a statistical test to ensure that this overlap is not just due to chance. However, I am unsure of what test I should perform and what program I should use.

Unfortunately, I only have slight experience with MINITAB and SPSS and no experience with R.

Secondly, I have done a pathway over-expression analysis using innateDB on both DEG and GWAS genes lists. The DEG list gave ~150 over-expressed pathways while the GWAS list gave ~100. Using the unique pathway identifier, I again compared the 2 pathway lists (1 from each gene list) and found that ~50 overlapped. Again a statistical test should be done but would this be the same test as the first instance?


Thank you for all your help!

Tl;dr: I have overlapped 2 gene lists and found that they overlap in terms of genes and pathways but do not know what statistical test should be used to give evidence that these results are not due to chance.

gene • 3.8k views
ADD COMMENTlink modified 5.5 years ago by Biostar ♦♦ 20 • written 5.6 years ago by weeweedelivery12340

Hypergeometric distribtuion test can be used to test for the overlap. See here: Probability Of Gene List Overlap and here:

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k

Hello and thank you for your reply! 
I read your recommendations about hypergeometric distribution testing and found them extremely helpful! 

Is it therefore correct to say that since my situation is such that I have
1) gene list A which is 500 DEGs derived from a platform of 17000 probes 
2) gene list B comprising 200 genes derived from a collection of GWAS
However only 75% - i.e. 150 of these GWAS genes - are represented on the platform and the overlap of genelist A and B is 15 genes, so my hypergeometric distribution test should be looking for:

The probability of getting 15 or more white balls in a sample of size 150 from an urn with 500 white balls and 16500 black balls, assuming H0. 
Therefore, if I used R it would want my input to be: 

1-phyper(14, 500, 16500, 150)

Also, I was wondering if I could use hypergeometric distribution test to test for the overlap between the 2 lists of pathways derived from pathway over-expression analysis (ORA) based on the 2 gene lists? If so, what would the N be? Would it be the total number of possible pathways you could get from the ORA?

Thank you for your time!

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by weeweedelivery12340

Yes, you have formulated it correctly. I have never used the R phyper so not sure what corresponds to what in parameters but I am sure you have done it correctly. Well theoretically you can apply the same test to pathways but there may be some concerns. For example, only small number of genes from a group are over-represented in the pathways. Lets assume only 20 genes of 150 GWAS genes are part of pathways and may not represent the whole group. I am not sure if you can still apply  some test directly. 

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Ashutosh Pandey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2139 users visited in the last hour