Question

Use Hypergeometric analysis to model data from RNAseq

1

Entering edit mode

7.6 years ago

nicolas.hipp ▴ 10

Hi everyone,

I finished my RNAseq analyses, and i have problems to model it. For example i have two comparisons A/Control and B/Control, i have the fold change for the gene expression, and the pval. Between these two comparisons i have genes unregulated in both condition and i would represent it with Ven-Diagram and calculate a pval of this overlapping using hypergeometric distribution.

I read this post on Biostars: Probability Of Gene List Overlap, but instead of him, i don't have the same number of genes between the both comparison (due to filtering data). Does somebody know how if i must use it with the smallest number of genes between both comparison, or if i need to weight the total number of genes?

Thanks a lot, and sorry for my english ;)

overlap R • 3.8k views

ADD COMMENT • link 7.6 years ago by nicolas.hipp ▴ 10

0

Entering edit mode

What biological question are you hoping to answer with an overlap test like that? As an aside, it's unlikely that a set overlap test will answer that question, since it's equivalent to saying p-values of 0.04999999 and 0.05 are meaningfully different.

ADD REPLY • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

Hi,

I work on cell differentiation," A "subset represent cells which are primed to differentiate by a classical way. On the other side "B" subset represent cells which are on the way to differentiate, but no primed classically. So these cells are able to differentiate, but not classically. I would like to show that they share genes with cells which are primed by classical signal, but they also express genes which are specific to this conditions. So the gene which are share are common for the differentiation process (gene overlapping), and the other genes are specific of the unclassical treatments.

"C" subset are cells which will fail in the differentiation process.

But maybe i'm wrong, and this is not the good way to do that?

Thanks again ;)

ADD REPLY • link 7.6 years ago by nicolas.hipp ▴ 10

0

Entering edit mode

Rather than an overlap test, you'd be better off taking the set of genes DE in one set (use a loose adjusted p-value threshold) and do GSEA or similar on it in the other. This helps avoid using p-value cut offs multiple times.

ADD REPLY • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

I have a similar problem:

I have a list of genes differentially expressed in cell B between "control vs Knock Out" experiment. I want to see if there is an overlap (and via hypergeometric test that is not just by chance) between the genes down-regulated in the Knock out cell B with genes that are Up regulated between in cell B between cell A vs cell B. As you can notice the cell B is different from cell A and I am wondering if there is a suitable strategy to handle this problem to perform the hypergeometric test.

Basically, the treatment in experiment B is the knock out of a protein of interest and I would like to know if genes that are downregulated in B (because of the removal of the protein) are also involved in the transition between state A and state B (i assume that genes up-regulated in B in A vs B are involved in the differentiation from cell A to B). We are trying to understand if this protein is promoting the transition between the two states...

ADD REPLY • link 5.5 years ago by fusion.slope ▴ 250

0

Entering edit mode

I am not sure I get the problem here. The hypergeometric test is the correct way of testing for overlap between sets. Does this help ?

ADD REPLY • link 7.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes, it highlight my problem. I don't know in this example if 14800 is a number shared by the both experiments. I understand that it reflects the number of "black balls" but is there the same number of total white ball? Or not?

Thanks for the link, it help a lot

ADD REPLY • link 7.6 years ago by nicolas.hipp ▴ 10

0

Entering edit mode

You need to define the size of your background population, that is the set of genes that can be compared. This post discusses this issue.

ADD REPLY • link 7.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Oh this is a very good post, I didn't see it before. So I will try to follow the instructions and have the same size for my background population :)

Thanks a lot for the answer ;)

ADD REPLY • link 7.6 years ago by nicolas.hipp ▴ 10