First, for a simple hypergeometric test (a.k.a. Chi-square test) it is not necessary to run a python program, there are good online servers for doing this, e.g. http://www.langsrud.com/fisher.htm (this server performs a 'Fisher's exact test' which is superior to Chi-square for small numbers).

The more crucial question that has to be addressed is the background. If I understand the previous answer(s) correctly, the respondents assume that your 'genome regions' are sampled from a fixed 'countable' pool of genome regions. If this is not the case, this task requires further consideration. For estimating these numbers, you have to think about what is the maximum of 'genome regions' that could have been found (is it determined by the number of features on an array, or by the number of markers that bracket the intervals?
Then, you have to think about what degree of 'overlap' do you consider to be 'enough overlap' and what is the probability to find such an overlap in randomly picked 'genomic regions'.

These numbers are straightforward only if there is a defined number of possible 'genomic region' outcomes (and not a continuum) and if 'overlap' in you set means that the genomic regions are identical (and not just partially overlapping). If these conditions are not met, the problem can still be addressed but this depends on the exact nature of your data.

•

link
written
8.2 years ago by
Lyco • **2.3k**
See related post: http://biostar.stackexchange.com/questions/5501/how-do-you-calculate-if-two-sets-of-genomic-regions-overlap-significantly

18k