Say I have 3 sets A
, B
and C
of genes chosen from an entire set of 20K genes, and I want to see whether set A
has a more significant gene overlap with set C
than set B
. That's, I want to compute a p-value p
associated with the overlap of A
and C
, and another p-value q
associated with the overlap of B
and C
, and I want to check whether p < q
. I know that the p-values p
and q
in this problem can be computed using Fisher's exact test. My question is as follows: If the set A
is bigger than set B
(e.g. 150 genes in A
vs. 30 genes in B
), is it still a fair comparison when I compare the overlaps of each set with C
based on the p-values computed using Fisher's exact test?
My initial thought is that you might have to compare proportional overlaps and not the raw numbers themselves. And even them I'm not sure this makes a lot of sense in edge cases, e.g., comparing two set memberships which have lets say 10 and 10000 members respectively.