Question: Is it fair to compare the significance of overlap of 2 different sized gene sets with a third gene set using Fisher's exact test?
gravatar for ebrudermanver
3.2 years ago by
ebrudermanver80 wrote:

Say I have 3 sets A, B and C of genes chosen from an entire set of 20K genes, and I want to see whether set A has a more significant gene overlap with set C than set B. That's, I want to compute a p-value p associated with the overlap of A and C, and another p-value q associated with the overlap of B and C, and I want to check whether p < q. I know that the p-values p and q in this problem can be computed using Fisher's exact test. My question is as follows: If the set A is bigger than set B (e.g. 150 genes in A vs. 30 genes in B), is it still a fair comparison when I compare the overlaps of each set with C based on the p-values computed using Fisher's exact test?

statistics • 1.4k views
ADD COMMENTlink modified 3.2 years ago by Nicolas Rosewick9.2k • written 3.2 years ago by ebrudermanver80

My initial thought is that you might have to compare proportional overlaps and not the raw numbers themselves. And even them I'm not sure this makes a lot of sense in edge cases, e.g., comparing two set memberships which have lets say 10 and 10000 members respectively.

ADD REPLYlink written 3.2 years ago by mforde841.3k
gravatar for Jean-Karim Heriche
3.2 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

The overlap between two sets follows a hypergeometric distribution. When doing a hypergeometric test, the various sizes are taken into account so because Fisher's exact test is based on the hypergeometric distribution, it also takes sample size into consideration. Note that for this (and the Chi-squared test), you need to use counts not proportions.
So if you want to know if A∩C is more likely than B∩C, you can compare the p-values.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Jean-Karim Heriche23k
gravatar for Nicolas Rosewick
3.2 years ago by
Belgium, Brussels
Nicolas Rosewick9.2k wrote:

You can use superExactTest R package for this :

ADD COMMENTlink written 3.2 years ago by Nicolas Rosewick9.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 709 users visited in the last hour