Question: hypergeometric test on outlier SNPs
1
gravatar for Ana
4 months ago by
Ana120
Ana120 wrote:

Hi all, I have done some genome scan analyese with 2 different methods to identify outlier SNPs. There are some overlapping between these 2 methods. I want to know if the observed overlap between these 2 methods is any better than that obtained by chance alone? I have read different pots(https://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c or https://www.biostars.org/p/90662/), but I am just getting a bit confused.

The total number of SNPs = 2,000,000,
total number of outlierSNPs discovered by method 1 =7889
total number of outlier SNPs discovered by method 2 =46340
overlapping between methods 1 and 2 outliers = 4567

I am using the "hyper" function in R, but I just do not understand how to specific hyper parameters

phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

first question, n is total number of SNPs - m or it should be total number of outlier SNPs outliers -m? how can I replace these parameters with actual values? Should it be like

phyper(4567-1, 46340,2,000,000-46340, 7889, lower.tail = TRUE, log.p = FALSE)

then I get 1, this means the overlapping observed is totally by chance! I would appreciate if anyone could help me to resolve my problem.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Ana120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour