Question: How many unique human Entrez and HUGO gene IDs are there? What number to use for Fisher's exact test?
gravatar for Kristin Muench
4.6 years ago by
United States
Kristin Muench470 wrote:

I'm trying to do a Fisher's exact test. I'd like to learn if the genes that are differentially expressed in a microarray meta-analysis are enriched for genes that are transcriptionally controlled by Gene X.

What I'm trying to understand is: for my Fisher's exact test, what should all of the cells in the contingency table sum to (the "Total Number")?

This could be all the genes in the genome - but then what number do I use for that figure? The number of HUGO symbols? The number of Entrez genes (I'm using Entrez gene identifiers for my analyses)? If so, how can I find those numbers?

It could also be the number of probes used to determine my differentially expressed genes, or Gene X. The reasoning behind this is that the microarray probe set will only pick up a portion of the total possible genes in the genome. If I am using microarray probes to identify differentially expressed genes, or if the ChIP-on-chip for Gene X only studied a certain number of locations, I will necessarily miss any genes in the unstudied part of the genome (the part for which there are no probes) that might bind Gene X or be differentially expressed. Therefore, the Total Number should represent the portion of the genome I'm studying, not the whole genome.

I know that Fisher's exact test results really depend on what answer I choose for this, so I'd like to have a good argument for whatever I pick. 

entrez hugo R • 1.8k views
ADD COMMENTlink modified 4.6 years ago by Jean-Karim Heriche21k • written 4.6 years ago by Kristin Muench470
gravatar for Jean-Karim Heriche
4.6 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

You should only consider genes that you have interrogated in your experiment. If some genes are not represented in your probe set, how would you justify including them in the test ?

ADD COMMENTlink written 4.6 years ago by Jean-Karim Heriche21k

I wasn't sure that you could!

One complication is that this is for a meta-analysis, each of which is on a different platform with a different probe set. There are only about 10,000 genes in common across the three datasets.

The ideal strategy to make all cells of the contingency table = 10,000, right?

ADD REPLYlink written 4.6 years ago by Kristin Muench470

Taking only the common genes is safe otherwise you could also view the genes not represented in one set as missing values. In an ideal world, detection of differentially expressed genes should not depend on the platform used so you could consider that any gene represented on any platform has been tested. Anyway, a small variation in the number of background genes is not going to make a dramatic difference to the result. In addition, don't trust vendor supplied mappings, they can be wrong/out-of-date. To be accurate, you should map all the probe sequences from the different platforms to the same reference genome.

ADD REPLYlink written 4.6 years ago by Jean-Karim Heriche21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1931 users visited in the last hour