Hi all, I have done some genome scan analyese with 2 different methods to identify outlier SNPs. There are some overlapping between these 2 methods. I want to know if the observed overlap between these 2 methods is any better than that obtained by chance alone? I have read different pots(https://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c or https://www.biostars.org/p/90662/), but I am just getting a bit confused.
The total number of SNPs = 2,000,000,
total number of outlierSNPs discovered by method 1 =7889
total number of outlier SNPs discovered by method 2 =46340
overlapping between methods 1 and 2 outliers = 4567
I am using the "hyper" function in R, but I just do not understand how to specific hyper parameters
phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
first question, n is total number of SNPs - m or it should be total number of outlier SNPs outliers -m? how can I replace these parameters with actual values? Should it be like
phyper(4567-1, 46340,2,000,000-46340, 7889, lower.tail = TRUE, log.p = FALSE)
then I get 1, this means the overlapping observed is totally by chance! I would appreciate if anyone could help me to resolve my problem.