I have several lists of genes from a test I have run on genomes of different organisms. I have calculated the percentage of sharing between these lists in a pairwise fashion. I want to test whether the pairwise sharing is greater than expected by chance. The hypergeometric test appears to be the standard approach to do this. However, the versions of the test I have seen require you to input as background the number of genes in the genome. As this varies between each of lists (each is from a different species) I do not think I can implement the standard versions of the test available online.
I have been trying to generate distributions of expected sharing using resampling from my actual background genelists as I think this is the most appropriate solution (see http://stats.stackexchange.com/questions/232627/want-to-calculate-significance-of-pairwise-sharing-between-lists-standard-hyper ) but calculating so many pairwise comparisons is currently beyond my coding ability. To me resampling lists of the same size as those I observe from my test from the appropriate genomic background for each species then calculating the overlap from these subsamplings is the best possible approach to test whether the overlaps I observe in the real data are significant.
Any advice of tools I might be able to use, comments on the validity of my approach, or help with the code would be very appreciated.
Thanks for your assistance.