Hypergeometric test of gene list overlap
1
1
Entering edit mode
2.9 years ago
wstla27 ▴ 20

I have several lists of genes, say A, B, C. I want to test if the overlap between A and B; A and C are significant or not. I am planning on doing a hypergeometric test using R.

phyper(q, m, n, k, lower.tail = FALSE, log.p = FALSE)


q: # of overlap between A and B/C - 1

m: # of genes in A

n: total # of genes in sample - # of genes in A

k: # of genes in B/C

Also, what if there is no overlap? Do I still do the same thing but with q=0 or q=-1?

Thanks a lot for the help!

gene R overlap hypergeometric test • 4.6k views
0
Entering edit mode

Is there anyway to do this test for mutliple lists. Like lets say that I want to test whether gene lists for seven tests are more similar than expected by chance. I know that I can do 7+6+5...1 pairwise tests. But it would seem more elegant to do a global test for overlap.

0
Entering edit mode

What you're asking is unclear. What do you mean by "to test whether gene lists for seven tests are more similar than expected by chance.".
The test is for deciding how likely/unlikely the overlap between two sets is, there's no notion of similarity involved. Also consider posting as a new question providing a link to bioinformatics otherwise this is probably a purely statistical question best addressed on Cross Validated.

1
Entering edit mode
2.9 years ago
benformatics ★ 2.9k

So if I get this straight. You want to see if there is an enrichment of genes from A in either B or C. Overall it looks like your setup is good.

Your universe (n) is all the genes in across your samples.

The q-value depends on the question you want to ask. In your case you seem to be making a decision on asking the following:

P(Observed less than q overlaps) using use q-1

Another option however would be to ask:

P(Observed q or less overlaps) using q

As to your last question: The probability of observing less than 0 overlaps is 0 so doing that test wouldn't be very informative.