Question

survfit(Surv()) P-value interpretation for 3 survival curves?

7

Entering edit mode

4.6 years ago

JJDollar ▴ 130

Using the survfit(Surv()) functions in R, I am generating survival curves for a gene of interest, using z.scores to label patient data as upregulated, downregulated, or "regulated". However, using defaults to generate p-value only returns a single p-value. How should I interpret this p-value in reference to this data? Is there a way to generate a p-value for each dysregulated survival curve compared to the "regulated" survival curve? Code used for reference below. Thanks for your advice!

fit <- survfit(Surv(tcga_data$os_months, tcga_data$os_status) ~ gene_status, data = tcga_data)
ggsurvplot(fit, conf.int = F, surv.median.line = c("hv"), 
                   data = tcga_data, 
                   pval = TRUE, pval.coord = c(300, 1), 
                   risk.table = TRUE)

survfit Surv survival curves R • 16k views

ADD COMMENT • link updated 4 weeks ago by marshelma • 0 • written 4.6 years ago by JJDollar ▴ 130

score 12 · Accepted Answer · 2019-10-07

The default p-value that is calculated by survfit() is the log rank p-value from the score test, which is one of the most oft-quoted p-values for survival data.

If you want to obtain a p-value for each individual stratum compared to the base / reference stratum, then you can use the Cox proportional hazards model, which will produce the same log rank p-value as Survfit() when ties are 'exact':

load AML data

require(survival)
require(survminer)

aml
   time status             x
1     9      1    Maintained
2    13      1    Maintained
3    13      0    Maintained
4    18      1    Maintained
5    23      1    Maintained
6    28      0    Maintained
7    31      1    Maintained
8    34      1    Maintained
9    45      0    Maintained
10   48      1    Maintained
11  161      0    Maintained
12    5      1 Nonmaintained
13    5      1 Nonmaintained
14    8      1 Nonmaintained
15    8      1 Nonmaintained
16   12      1 Nonmaintained
17   16      0 Nonmaintained
18   23      1 Nonmaintained
19   27      1 Nonmaintained
20   30      1 Nonmaintained
21   33      1 Nonmaintained
22   43      1 Nonmaintained
23   45      1 Nonmaintained

let's change the data to add a third stratum, 'SuperMaintained'

aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
aml
   time status               x
1     9      1      Maintained
2    13      1      Maintained
3    13      0      Maintained
4    18      1      Maintained
5    23      1      Maintained
6    28      0      Maintained
7    31      1      Maintained
8    34      1      Maintained
9    45      0      Maintained
10   48      1 SuperMaintained
11  161      0 SuperMaintained
12    5      1   Nonmaintained
13    5      1   Nonmaintained
14    8      1   Nonmaintained
15    8      1   Nonmaintained
16   12      1   Nonmaintained
17   16      0   Nonmaintained
18   23      1   Nonmaintained
19   27      1   Nonmaintained
20   30      1   Nonmaintained
21   33      1   Nonmaintained
22   43      1 SuperMaintained
23   45      1 SuperMaintained

With the factor() command, I also set the reference level to 'Nonmaintained'.

use `survfit()`

fit <- survfit(
  Surv(time, status) ~ x,
  data = aml)

surv_pvalue(fit)
    variable        pval   method   pval.txt
           x 0.005309417 Log-rank p = 0.0053

ggsurvplot(
  fit,
  conf.int = FALSE,
  surv.median.line = c('hv'), 
  data = aml, 
  pval = TRUE,
  pval.method = TRUE,
  risk.table = FALSE)

So, the log rank p-value is 0.005309417; Survival is worse for the 'Nonmaintained' stratum.

use `coxph()`

coxfit <- coxph(
  Surv(time, status) ~ x,
  data = aml,
  ties = 'exact')

summary(coxfit)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml, ties = "exact")

  n= 23, number of events= 18 

                     coef exp(coef) se(coef)      z Pr(>|z|)   
xMaintained      -1.12456   0.32479  0.58806 -1.912  0.05584 . 
xSuperMaintained -2.60439   0.07395  0.92232 -2.824  0.00475 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                 exp(coef) exp(-coef) lower .95 upper .95
xMaintained        0.32479      3.079   0.10258    1.0284
xSuperMaintained   0.07395     13.523   0.01213    0.4508

Concordance= 0.729  (se = 0.054 )
Likelihood ratio test= 10.87  on 2 df,   p=0.004
Wald test            = 8.58  on 2 df,   p=0.01
Score (logrank) test = 10.48  on 2 df,   p=0.005

So, the p-values for each stratum are:

Maintained vs Nonmaintained, 0.05584
SuperMaintained vs Nonmaintained, 0.00475

The hazard ratios (HRs) are given by:

exp(coef), HR
exp(-coef), HR flipped the other way
lower .95, lower 95% CI of the HR
upper .95, upper 95% CI of the HR

check that the log rank p-values are the same in both models

The log rank p-value from the Cox model can be accessed by: summary(coxfit)$sctest[3]

round(summary(coxfit)$sctest[3], digits = 9) == round(surv_pvalue(fit)[,2], digits = 9)
pvalue 
  TRUE

Kevin

load AML data

let's change the data to add a third stratum, 'SuperMaintained'

use survfit()

use coxph()

check that the log rank p-values are the same in both models

use `survfit()`

use `coxph()`