Testing Gene Set Overlap with Binomial Distribution or Hypergeometric Distribution
1
0
Entering edit mode
8 months ago
Apex92 ▴ 280

Dear all,

I have two gene sets and I want to see if the amount of overlap between these two sets is significant using binomial statistics.

I came up with this approach in r but it does not give significant p-value however based on the hypergeometric test (assuming in total I have 10,000 as the background set) I get a significant p-value.

# Parameters
n_A <- 90  # Number of genes in Set A
n_B <- 2588  # Number of genes in Set B
k <- 37  # Number of overlapping genes

#probability of overlap
p <- n_A / n_B

p_value <- 1 - pbinom(k - 1, n_B, p)

print(paste("Calculated p-value:", p_value))

How to resolve this?

Another question is, is it important that n_B should always be bigger in the binomial distribution test?

Thank you in advance.

statistics Enrichment • 489 views
ADD COMMENT
0
Entering edit mode
8 months ago
Michael 54k

Have a look at the hypergeometric distribution as discussed here: Probability of gene list overlap

ADD COMMENT
0
Entering edit mode

Thank you for your comment. So based on the thread you shared, I assume I can calculate the p-value as:

n_A=90
n_B=2588
n_C=10000
n_A_B=37

p-val_1 <- 1-phyper(n_A_B, n_B, n_C-n_B, n_A) #p>n_A_B
p-val_2 <- phyper(n_A_B - 1, n_A, n_C-n_A, n_B, lower.tail = FALSE) #p>=n_A_B

Is that correct? And it does not matter if the n_A is bigger or smaller that n_B right?

ADD REPLY

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6