I'm trying to do a Fisher's exact test. I'd like to learn if the genes that are differentially expressed in a microarray meta-analysis are enriched for genes that are transcriptionally controlled by Gene X.
What I'm trying to understand is: for my Fisher's exact test, what should all of the cells in the contingency table sum to (the "Total Number")?
This could be all the genes in the genome - but then what number do I use for that figure? The number of HUGO symbols? The number of Entrez genes (I'm using Entrez gene identifiers for my analyses)? If so, how can I find those numbers?
It could also be the number of probes used to determine my differentially expressed genes, or Gene X. The reasoning behind this is that the microarray probe set will only pick up a portion of the total possible genes in the genome. If I am using microarray probes to identify differentially expressed genes, or if the ChIP-on-chip for Gene X only studied a certain number of locations, I will necessarily miss any genes in the unstudied part of the genome (the part for which there are no probes) that might bind Gene X or be differentially expressed. Therefore, the Total Number should represent the portion of the genome I'm studying, not the whole genome.
I know that Fisher's exact test results really depend on what answer I choose for this, so I'd like to have a good argument for whatever I pick.