I have 2 sets of bed files from ChIP-seq, A and B, and a certain number of peaks overlap. Now I want to put some kind of enrichment score / p-value on that overlap - if only 2 peaks overlap and A and B contain both 10 000 peaks, it's not really significant, while if 90 peaks overlap, A is 10000 peaks and B is 100, it's more significant.
The classical way to do this would be to do a hypergeometric test - all the variables of the test are easily filled in (number of draws, number of successes, number of 'failures') except one, the total population number. In this scenario it would be something similar to 'the total number of peaks one could potentially draw from the genome', which is impossible to estimate, and could be biased (regions that don't chip well, unmappable regions, etc).
What's the best way to put a 'score' on an overlap between A and B then? The ultimate goal is to display some kind of heatmap, as I'm doing the comparison between many A's and many B's.