Question

hypergeometric test on outlier SNPs

1

Entering edit mode

7.0 years ago

Ana ▴ 200

Hi all, I have done some genome scan analyese with 2 different methods to identify outlier SNPs. There are some overlapping between these 2 methods. I want to know if the observed overlap between these 2 methods is any better than that obtained by chance alone? I have read different pots(https://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c or https://www.biostars.org/p/90662/), but I am just getting a bit confused.

The total number of SNPs = 2,000,000,
total number of outlierSNPs discovered by method 1 =7889
total number of outlier SNPs discovered by method 2 =46340
overlapping between methods 1 and 2 outliers = 4567

I am using the "hyper" function in R, but I just do not understand how to specific hyper parameters

phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

first question, n is total number of SNPs - m or it should be total number of outlier SNPs outliers -m? how can I replace these parameters with actual values? Should it be like

phyper(4567-1, 46340,2,000,000-46340, 7889, lower.tail = TRUE, log.p = FALSE)

then I get 1, this means the overlapping observed is totally by chance! I would appreciate if anyone could help me to resolve my problem.

outliers hypergeometric test R • 1.7k views

ADD COMMENT • link 7.0 years ago by Ana ▴ 200