Is the bioconductor RRHO R package p-value computation for two.sided completely wrong ?
3.1 years ago
Anthony • 0

Hi everybody,

I am trying to do some rank-rank hyper-geometric overlap with the RRHO R package. I am using alternative="two.sided".

The log p-value are very strange for multiple reasons (see R/ExpressionAnalysis.R function numericListOverlap):

• the 0 is not taken care of for the computation of the log. Does not -log(pval + eps) -- eps being a small number -- make a better choice?
• some p-values are above one. 2*the.mean - count (see EDIT below) and log.pval<- -log( phyper(q=lower+tol, m=a, n=n-a+1, k=b, lower.tail=TRUE) + phyper(q= upper-tol, m=a, n=n-a+1, k=b, lower.tail=FALSE)) are meaningless for me. I think that the former should be replaced by mean - count and (see EDIT below) the later should be divided by two. I am right?
• Additionally, I have absolutely no idea at all what the tol parameter means.

The package is downloaded more than 100 times by month and is the basis for RRHO2 publication. Consequently, I am puzzled to not find anything about those issues on the web.

Anthony.

EDIT: The if-else construct with 2*the.mean - count is equivalent to

             absval <- abs(count - the.mean)
upper <- the.mean + absval ## same as 2*the.mean - count
## for  count - the.mean < 0


But I still do not understand, the p-values above one .

