Question: On the significance of the overlap between lists of genes
1
gravatar for elb
13 months ago by
elb160
Torino
elb160 wrote:

Hi guys I have a quite conceptual question on the significance of the comparison between two lists of genes. I have list1 and list2. Both lists contain 100 genes. The overlap is 13 genes and the universe is 17,611 genes (is the full set of genes from which the lists of 100 genes were derived). If I perform the Fisher's exact test to calculate the p-value of the overlap, i.e. the significance, it is p < 1.351e-14. It is strongly significant and I understand why if I consider the length of the universe. But if we consider the lists of 100 genes, the overlap is quite low: 13%. Should I consider the overlap finally significant or not?

Thank you in advance

ADD COMMENTlink modified 13 months ago by e.rempel760 • written 13 months ago by elb160

You can also simply take two random sets of 100 genes from 17,611 universal set and check how often you observe an overlap of 13 or more. This gives a sense to you if 13 gene overlapping is not by chance.

You can do 1000 random sets of 100 genes and check how ofter there is an overlap of 13 or more genes.

Fishers test is doing something similar but by doing random sets, you get a better sense.

ADD REPLYlink written 13 months ago by geek_y9.4k
1
gravatar for e.rempel
13 months ago by
e.rempel760
Germany, Heidelberg, COS
e.rempel760 wrote:

Hi,

as you said, the p-value of Fisher's exact test is highly significant. The reason for this, as you have also said, is the size of gene's universe.

I have seen some people making thoughts about the universe: can any random gene from these 17611 genes be selected in any of your lists? Sometimes that is not the case, e.g since some genes are silenced. But if you were able to restrict the gene's universe to 2000 genes (a very strong restriction, I admit), the Fisher's test would still be significant.

Thus, I would say the overlap is significant and there is some connection between your lists.

ADD COMMENTlink modified 13 months ago • written 13 months ago by e.rempel760
1
gravatar for Martombo
13 months ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

As already pointed out by e.rempel, you need to make sure that your comparison is not biased by independent variables like expression level. One way to achieve this is to apply a strong initial filter on expression, removing several genes from your universe, and use a statistical test that doesn't favour highly expressed genes. This kind of test is then only telling you that there is some kind of overlap between the two lists, but it doesn't actually quantify it. Where are these two lists coming from? If your aim is to measure how similar two comparisons / states are, you could correlate the effect sizes you get (eg fold-changes if they are transcriptional profiles).

ADD COMMENTlink written 13 months ago by Martombo2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour