Question

Why the p-value is 0, or how to keep the precision for small values?

2

Entering edit mode

3.4 years ago

Star ▴ 60

Hi all,

I am trying to calculate p-value from z-score, but p-value is coming as 0.

2*pnorm(-abs(37.9353259515))
[1] 0

However, if I get a smaller z-score, I get the accurate p-value.

2*pnorm(-abs(31.4730702149))
[1] 2.029847e-217

Can anyone please suggest a way around to get accurate p-value in 1st case instead of 0?

Cheers!

R • 1.9k views

ADD COMMENT • link updated 3.4 years ago by zx8754 11k • written 3.4 years ago by Star ▴ 60

score 6 · Answer 1 · 2020-12-15

Back in the days when computers didn't have the oomph they have today, you'd see a lot of -log10(value) calculations used to report very small numbers. Using calculations on or presenting log-transformed values may be an alternative, until you get to a point where you need to report high-precision results with Rmfpr or similar, depending on your programming environment.

library(Rmpfr)
.N <- function(.) mpfr(., precBits = 200)

2 * pnorm(-abs(.N(c(31.4730702149, 37.9353259515))))
# 2 'mpfr' numbers of precision  200   bits 
# [1] 2.0298465195466694380461698263104467695985404891415345411446557e-217
# [2] 6.7359508146380783828098958784721518979443306021620032358869696e-315

As to the "why" that explains R's native behavior, I'd suspect that it uses double-precision floating point values internally. This gives you a working limit on measurements with values ranging from approximately ±5e−324 to ±5e324 (subnormal) or ±2.23e-308 to ±2.23e308 (normal).

Given your example, I'd say R is probably using the "normal" range, which accounts for a binary error factor at the edges ("epsilon") that lets you do arithmetic more easily and accurately.

Long answer, short: If you need to work with numerical values outside this range — and there can be cases when data need to be ranked with scores of this type — either log-transform your numbers or use a third-party library to get 128-bit or greater precision. 128-bit precision will support values ranging from approximately ±6.48e−4966 to ±6.48e4966 (subnormal), or ±1.19e-4932 to ±1.19e4932 (normal).

I'm certain that the more precision you require from third-party libraries, the longer it will take to do calculations. Performance is already an issue in R, so that is something to keep in mind. A log-transform is very probably cheap and easy, comparatively.

score 3 · Answer 2 · 2020-12-15

3

Entering edit mode

3.4 years ago

Jean-Karim Heriche 27k

You've reached the limits of machine precision. 0 and 1e-217 (1 preceded by 216 zeros) are not meaningfully different. I don't see how adding more zeros will help you. This looks to me like a case of the XY problem.

ADD COMMENT • link 3.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

(This is probably good enough to be an answer rather than a comment.)

ADD REPLY • link 3.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Is it architecture-dependent? I get the following on:

Scientific Linux release 7.4 (Nitrogen)
Kernel name is : Linux 3.10.0-514.6.1.el7.x86_64 x86_64
CPU Type : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

.

.Machine$double.xmin
[1] 2.225074e-308


options(scipen=999)
.Machine$double.xmin
[1] 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225074

ADD REPLY • link 3.4 years ago by Kevin Blighe 87k

0

Entering edit mode

This is a complicated thing and I am not sure I understand all of it. This has to do with floating point arithmetic. As far as I know, machine precision depends on the numeric type (i.e. whether using float or double or long), on the programming language (or the compiler) and also the OS. Then I think most values in use are actually defined by some IEEE standard but not every application has to use them. To complicate matters, there are different "types" of machine precision. Strictly speaking machine precision is the smallest number eps such that 1+eps != 1 (i.e. .Machine$double.eps). Then there's the smallest non-zero number (.Machine$double.xmin). In R, .Machine gives you what the machine is capable of but that doesn't mean this is what's used by R or even some packages.

ADD REPLY • link 3.4 years ago by Jean-Karim Heriche 27k