Question: How To Identify If Gene Overlaps Between Studies Are Significant?
I have a theoretical and methodological question and I would appreciate peoples ideas. We have run two independent studies (different species) to identify if similar genes are involved in a phenotype. From each species we have a list of candidate genes. Some of these genes overlap between the species. How would you go about testing if this overlap is significant? Say species 1 has 20,000 genes in total in the genome and species 2 has 22,000 genes. From a candidate list of 200 genes in species 1 and 120 genes in species 2 we find 15 overlapping genes. From a naive perspective I would think a Chi square test of Fishers exact test would be appropriate for a first look. Any ideas for the best test for significance here would be great.

just some general thoughts: There are some other issues that might make such an analysis invalid, such as the fact that some genes may be more prone to mutation than others (we identify candidate genes based on them having a high frequency alternative allele) and longer genes are more likely to be mutated. We could do a separate analysis to check that these genes are not longer on average than a random sample of genes to check for this possibility.

I think as you suggest the Fisher or chi-square is an appropriate statistics.

Personally I also think that simple tests when appropriate are far better than complex ones as in these latter one needs a deeper understanding of the potentially tacit assumptions.

As a second option, of course the analysis may have flaws but in the end the p-values are there to help you decide what to validate in other different ways.

