Question: Hypergeometric Test Of Microarrays Gene Lists
1
Assa Yeroslaviz1.3k wrote:

Hi,

we have several microarrays experiments, for which we have the list of differentially regulated genes. We analyzed the overlap for each of the pairs and now would like to know how significance are these overlaps.

I did it with the `phyper` function this way: set 1 mit 2

`````` totalNumarrays = 21542 # total number of array probes
DEgene_set1 = 1453 # differentially regulated genes of set 1
DEgene_set2 = 4987 # differentially regulated genes of set 2
overlap =481 # overlap between the two sets.
Prob = phyper(overlap -1, DEgene_set1, totalNumarrays, DEgene_set2, lower.tail=FALSE, log.p = FALSE)
``````

The same was done for set1 1 vs. 3 and 1 vs. 4 wth the same total amount of genes.

Now, my problem is, that sets 3 and 4 are from different technologies. They have different total number of array probes.well, my question is basically - does it matter?

Do I need to modify the formula to get the correct results?

I would appreciate any help

Assa

overlap microarray statistics • 2.6k views
modified 7.3 years ago by Sudeep1.6k • written 7.3 years ago by Assa Yeroslaviz1.3k
1
Sudeep1.6k wrote:

I assume you are interested only in finding the significance of overlap between the DEGs in the arrays you have, and that you have some kind of mapping from your array probe ids to a database. Then shouldn't you be taking `totalNumarrays` as not the `total number of array probes`, but the union of all the probes that could be mapped to genes in the arrays you calculate significance for as the universal list ? and IMHO I don't think that it does matter that the arrays are from different technologies, because in this case it is just calculating the significance of overlap between two lists A and B that are subsets of a super-set C, isn't it ?. One more thing, in `phyper` function, why are you taking `overlap -1` instead of `overlap` ?