Question: Hypergeometric Test On Gene Set
8
gravatar for ChIP
5.8 years ago by
ChIP520
Netherlands
ChIP520 wrote:

Hi!

This is not the first time this question is being asked, but I am confused from the previous post.

I have say two lists. List1 has 598 genes and List2 has 5500 genes and the total genes available in the pool from which these two are drawn is of size 23000 (say).

Now, if I have to make compute whether the overlap between the two list which is of 89 genes is significant or not.

I have two formulas:

method 1

phyper=(overlap-1,list1,PopSize-list1,list2,lower.tail = FALSE, log.p = FALSE)

phyper=(88,598,23000-598,5500,lower.tail = FALSE, log.p = FALSE)

method 2

phyper=(overlap,list1,PopSize,list2,lower.tail = FALSE, log.p = FALSE)

phyper=(89,598,23000,5500,lower.tail = FALSE, log.p = FALSE)

Now which method shall I use and why?

I am really confused.

Thank you

bioinformatics statistics R • 16k views
ADD COMMENTlink modified 3.5 years ago by Alejandro Jimenez Sanchez120 • written 5.8 years ago by ChIP520
4

this thread should help you http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c

and remember that the p-value is the probability of obtaining a result at least as extreme as the one that was randomly observed

ADD REPLYlink written 5.8 years ago by arno.guille400
2
gravatar for Sudeep
5.8 years ago by
Sudeep1.6k
.
Sudeep1.6k wrote:

Your method1 looks like the correct one. AFAIK, in phyper=(q,m,n,k)

n should be PopSize-list1

you can check this stackoverflow thread as well

ADD COMMENTlink written 5.8 years ago by Sudeep1.6k

Can anyone explain about the q-1, why or why not?

ADD REPLYlink written 5.8 years ago by Madelaine Gogol5.1k
2

answer is here http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c

phyper(x, m, n, k) gives the probability of getting x or fewer, so phyper(x, m, n, k) is the same as sum(dhyper(0:x, m, n, k)).

The lower.tail=FALSE is a bit confusing. phyper(x, m, n, k, lower.tail=FALSE) is the same as 1-phyper(x, m, n, k), and so is the probability of x+1 or more

ADD REPLYlink written 5.8 years ago by arno.guille400
0
gravatar for Alejandro Jimenez Sanchez
3.5 years ago by
Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

I think method 1 is the correct one, because.

Method 1 gives the same result as this site: https://www.geneprof.org/GeneProf/tools/hypergeometric.jsp

Method 2 gives a different result.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Alejandro Jimenez Sanchez120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2626 users visited in the last hour