Question: On the significance of the overlap between lists of genes
gravatar for elb
23 months ago by
elb170 wrote:

Hi guys I have a quite conceptual question on the significance of the comparison between two lists of genes. I have list1 and list2. Both lists contain 100 genes. The overlap is 13 genes and the universe is 17,611 genes (is the full set of genes from which the lists of 100 genes were derived). If I perform the Fisher's exact test to calculate the p-value of the overlap, i.e. the significance, it is p < 1.351e-14. It is strongly significant and I understand why if I consider the length of the universe. But if we consider the lists of 100 genes, the overlap is quite low: 13%. Should I consider the overlap finally significant or not?

Thank you in advance

ADD COMMENTlink modified 23 months ago by e.rempel790 • written 23 months ago by elb170

You can also simply take two random sets of 100 genes from 17,611 universal set and check how often you observe an overlap of 13 or more. This gives a sense to you if 13 gene overlapping is not by chance.

You can do 1000 random sets of 100 genes and check how ofter there is an overlap of 13 or more genes.

Fishers test is doing something similar but by doing random sets, you get a better sense.

ADD REPLYlink written 23 months ago by geek_y10k
gravatar for e.rempel
23 months ago by
Germany, Heidelberg, COS
e.rempel790 wrote:


as you said, the p-value of Fisher's exact test is highly significant. The reason for this, as you have also said, is the size of gene's universe.

I have seen some people making thoughts about the universe: can any random gene from these 17611 genes be selected in any of your lists? Sometimes that is not the case, e.g since some genes are silenced. But if you were able to restrict the gene's universe to 2000 genes (a very strong restriction, I admit), the Fisher's test would still be significant.

Thus, I would say the overlap is significant and there is some connection between your lists.

ADD COMMENTlink modified 23 months ago • written 23 months ago by e.rempel790
gravatar for Martombo
23 months ago by
Seville, ES
Martombo2.6k wrote:

As already pointed out by e.rempel, you need to make sure that your comparison is not biased by independent variables like expression level. One way to achieve this is to apply a strong initial filter on expression, removing several genes from your universe, and use a statistical test that doesn't favour highly expressed genes. This kind of test is then only telling you that there is some kind of overlap between the two lists, but it doesn't actually quantify it. Where are these two lists coming from? If your aim is to measure how similar two comparisons / states are, you could correlate the effect sizes you get (eg fold-changes if they are transcriptional profiles).

ADD COMMENTlink written 23 months ago by Martombo2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour