**40**wrote:

Say I have 3 sets `A`

, `B`

and `C`

of genes chosen from an entire set of 20K genes, and I want to see whether set `A`

has a more significant gene overlap with set `C`

than set `B`

. That's, I want to compute a p-value `p`

associated with the overlap of `A`

and `C`

, and another p-value `q`

associated with the overlap of `B`

and `C`

, and I want to check whether `p < q`

. I know that the p-values `p`

and `q`

in this problem can be computed using Fisher's exact test. My question is as follows: If the set `A`

is bigger than set `B`

(e.g. 150 genes in `A`

vs. 30 genes in `B`

), is it still a fair comparison when I compare the overlaps of each set with `C`

based on the p-values computed using Fisher's exact test?

**5.4k**• written 9 weeks ago by ebrudermanver •

**40**

My initial thought is that you might have to compare proportional overlaps and not the raw numbers themselves. And even them I'm not sure this makes a lot of sense in edge cases, e.g., comparing two set memberships which have lets say 10 and 10000 members respectively.

940