Question: Statistical test for overlap
0
pixie@bioinfo1.4k wrote:

Hello, I have a venn diagram with a list of up and down-regulated genes from experiment 1. I compared another gene list (from experiment 2) and got the overlap (108 genes are Up, 309 genes are Down and 41 genes are not deferentially expressed).

My NULL hypothesis is that the proportion of overlaps with up and down-regulated genes is 50-50 (that is random). I want to show that the proportion of down-regulated genes (as shown in the pie chart) is significant. What kind of statistical test should I do here ? Thanks. statistics • 183 views
modified 3 months ago by H.Hasani970 • written 3 months ago by pixie@bioinfo1.4k
1
e.rempel890 wrote:

Hi,

if I understood your question correctly, I would use the binomial test. Let me explain.

There are 417 genes in the overlap between Exp2 and Exp1 (Up and Down combined). Your NULL suggests that these genes are distributed between Up and Down with probability 0.5 for each subset (as you said, 50 - 50). That means in the lingo of the binomial test, you have 419 number of trials, 309 number of successes and probability of success equals 0.5. Thus, the way to compute binomial test in R would be

``````binom.test(x = 309, n = 417, p = 0.5)
``````

I obtained p-value less than 2.2e-16.

HTH

It should be `p = 5224/(5646+5224)` instead of 0.5 .

In this case shouldn't it be `p = (5224 + 309)/(5224 + 309 + 5646 + 108)` ? :)

Oh yeah, right. Another reason to not use Venn diagrams :)

You can overcome R `2.2e-16` limit with `binom.test(...)\$p.value`. Per your numbers p-value is `1.6e-23.`

1
H.Hasani970 wrote:

I would use proportion test. As the name says, the null hypothesis is that the proportion in each set is the same. It helps you answer questions like do we have more male proportion in group A compared to female proportion in group B (test for two proportions); or if male proportion in the group is similar/more/less in the entire population (test for one proportion)

1
Asaf8.4k wrote:

I think you can ask better questions like how is the distribution of LFC of genes in the group compared to genes outside the group, try plotting the LFC distribution in violin plot for instance of the two groups (in Exp2 and not in Exp2) or MA plot but color according to Exp2 or not, it will present much more than you chose to present and test (by the way, neither Euler graph nor pie chart are good choices for presenting data, there are better alternatives).