Let's say I'm looking at 5 independent microarrays, and some number of genes are upregulated on each microarray. If 200 of the same genes are upregulated on every microarray, what's the statistical test to prove that it's a significant enrichment? What if the genes are upregulated on 4 out of 5 microarrays?
Maybe I am getting this wrong, but I do not think the hypergeometric is the way to go. Am I right you are talking about 5 microarray for the same "experiment" like 5 biological replicates?
Install LIMMA from bioconductor, load the microarray, follow the documentation and perform a "standard" analysis. It is a linear model, and it does not use the hypergeometric, but the t-test (or a derivate...). If your array are Affymetrix, use package affy first and then LIMMA.
The hypergeometric doesn't take into account HOW MUCH they are upregulated nor how consistent your up-regulation is. The t-test does.
Then, of course, correct for multiple test.
I would use the hypergeometric only when comparing results of different experiments (using different platform or different conditions), but it does not sound like your case.
In general, try to learn about microarray analysis as much as you can before starting the analysis.
I think I know the answer, but let me say up front, I'm not a statistician. I think you use the hypergeometric distribution, and the first array forms the basis of a question that you then use to evaluate against the other arrays. Using the phyper function in R, you can calculate the probability of obtaining the same gene set between two array results, and I think you then simply repeat the process and multiply the resulting p-values (the same way you would multiply the odds of a given repeated dice roll). The help for phyper uses the Urn analogy, so that's what I'll use. Say that a given array has 10,000 spots, and you identify 300 top genes. Then you perform a second array, and you also select 300 top genes. When you examine the overlap, it is 200 genes. What is the likelihood of getting a 200 gene overlap by chance? The first array sets up the Urn as follows: there are 10,000 balls total, 300 of them are white. Doing the second array asks the question, what is the likelihood of drawing 300 balls from such an Urn and having 200 of them be white? (or more generally, for a top gene set of a given size from the second array, what are the chances that 200 of them will be white?). In R, he phyper function takes arguments of x = # white balls drawn (number of genes from array 2 that were found in common with array 1), m = # white balls total in the Urn (size of the original top gene set from array 1), n = # of black balls total in the Urn (# of array spots - the top gene set from array 1), k = # of balls drawn (size of the top gene set from array 2). So the the answer for the overlap between array 1 and 2 is:
# phyper function in R for geometric distribution 1 - phyper(x,m,n,k)
Then you calculate the same thing for array 1 and 3, and 1 and 4, and 1 and 5, and then you multiply them. That's my guess.