Question: Use Hypergeometric analysis to model data from RNAseq
gravatar for nicolas.hipp
3.6 years ago by
Rennes, France
nicolas.hipp0 wrote:

Hi everyone,

I finished my RNAseq analyses, and i have problems to model it. For example i have two comparisons A/Control and B/Control, i have the fold change for the gene expression, and the pval. Between these two comparisons i have genes unregulated in both condition and i would represent it with Ven-Diagram and calculate a pval of this overlapping using hypergeometric distribution.

I read this post on Biostars: Probability Of Gene List Overlap, but instead of him, i don't have the same number of genes between the both comparison (due to filtering data). Does somebody know how if i must use it with the smallest number of genes between both comparison, or if i need to weight the total number of genes?

Thanks a lot, and sorry for my english ;)

R overlap • 1.4k views
ADD COMMENTlink written 3.6 years ago by nicolas.hipp0

What biological question are you hoping to answer with an overlap test like that? As an aside, it's unlikely that a set overlap test will answer that question, since it's equivalent to saying p-values of 0.04999999 and 0.05 are meaningfully different.

ADD REPLYlink written 3.6 years ago by Devon Ryan94k


I work on cell differentiation," A "subset represent cells which are primed to differentiate by a classical way. On the other side "B" subset represent cells which are on the way to differentiate, but no primed classically. So these cells are able to differentiate, but not classically. I would like to show that they share genes with cells which are primed by classical signal, but they also express genes which are specific to this conditions. So the gene which are share are common for the differentiation process (gene overlapping), and the other genes are specific of the unclassical treatments.

"C" subset are cells which will fail in the differentiation process.

But maybe i'm wrong, and this is not the good way to do that?

Thanks again ;)

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by nicolas.hipp0

Rather than an overlap test, you'd be better off taking the set of genes DE in one set (use a loose adjusted p-value threshold) and do GSEA or similar on it in the other. This helps avoid using p-value cut offs multiple times.

ADD REPLYlink written 3.6 years ago by Devon Ryan94k

I have a similar problem:

I have a list of genes differentially expressed in cell B between "control vs Knock Out" experiment. I want to see if there is an overlap (and via hypergeometric test that is not just by chance) between the genes down-regulated in the Knock out cell B with genes that are Up regulated between in cell B between cell A vs cell B. As you can notice the cell B is different from cell A and I am wondering if there is a suitable strategy to handle this problem to perform the hypergeometric test.

Basically, the treatment in experiment B is the knock out of a protein of interest and I would like to know if genes that are downregulated in B (because of the removal of the protein) are also involved in the transition between state A and state B (i assume that genes up-regulated in B in A vs B are involved in the differentiation from cell A to B). We are trying to understand if this protein is promoting the transition between the two states...

ADD REPLYlink modified 17 months ago • written 17 months ago by fusion.slope210

I am not sure I get the problem here. The hypergeometric test is the correct way of testing for overlap between sets. Does this help ?

ADD REPLYlink written 3.6 years ago by Jean-Karim Heriche22k

Yes, it highlight my problem. I don't know in this example if 14800 is a number shared by the both experiments. I understand that it reflects the number of "black balls" but is there the same number of total white ball? Or not?

Thanks for the link, it help a lot

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by nicolas.hipp0

You need to define the size of your background population, that is the set of genes that can be compared. This post discusses this issue.

ADD REPLYlink written 3.6 years ago by Jean-Karim Heriche22k

Oh this is a very good post, I didn't see it before. So I will try to follow the instructions and have the same size for my background population :)

Thanks a lot for the answer ;)

ADD REPLYlink written 3.6 years ago by nicolas.hipp0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1772 users visited in the last hour