Question: survfit(Surv()) P-value interpretation for 3 survival curves?
2
gravatar for olive1212
9 months ago by
olive121260
olive121260 wrote:

Using the survfit(Surv()) functions in R, I am generating survival curves for a gene of interest, using z.scores to label patient data as upregulated, downregulated, or "regulated". However, using defaults to generate p-value only returns a single p-value. How should I interpret this p-value in reference to this data? Is there a way to generate a p-value for each dysregulated survival curve compared to the "regulated" survival curve? Code used for reference below. Thanks for your advice!

fit <- survfit(Surv(tcga_data$os_months, tcga_data$os_status) ~ gene_status, data = tcga_data)
ggsurvplot(fit, conf.int = F, surv.median.line = c("hv"), 
                   data = tcga_data, 
                   pval = TRUE, pval.coord = c(300, 1), 
                   risk.table = TRUE)
survival curves surv survfit R • 1.6k views
ADD COMMENTlink modified 9 months ago by Kevin Blighe61k • written 9 months ago by olive121260
3
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

The default p-value that is calculated by survfit() is the log rank p-value from the score test, which is one of the most oft-quoted p-values for survival data.

If you want to obtain a p-value for each individual stratum compared to the base / reference stratum, then you can use the Cox proportional hazards model, which will produce the same log rank p-value as Survfit() when ties are 'exact':

load AML data

require(survival)
require(survminer)

aml
   time status             x
1     9      1    Maintained
2    13      1    Maintained
3    13      0    Maintained
4    18      1    Maintained
5    23      1    Maintained
6    28      0    Maintained
7    31      1    Maintained
8    34      1    Maintained
9    45      0    Maintained
10   48      1    Maintained
11  161      0    Maintained
12    5      1 Nonmaintained
13    5      1 Nonmaintained
14    8      1 Nonmaintained
15    8      1 Nonmaintained
16   12      1 Nonmaintained
17   16      0 Nonmaintained
18   23      1 Nonmaintained
19   27      1 Nonmaintained
20   30      1 Nonmaintained
21   33      1 Nonmaintained
22   43      1 Nonmaintained
23   45      1 Nonmaintained

let's change the data to add a third stratum, 'SuperMaintained'

aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
aml
   time status               x
1     9      1      Maintained
2    13      1      Maintained
3    13      0      Maintained
4    18      1      Maintained
5    23      1      Maintained
6    28      0      Maintained
7    31      1      Maintained
8    34      1      Maintained
9    45      0      Maintained
10   48      1 SuperMaintained
11  161      0 SuperMaintained
12    5      1   Nonmaintained
13    5      1   Nonmaintained
14    8      1   Nonmaintained
15    8      1   Nonmaintained
16   12      1   Nonmaintained
17   16      0   Nonmaintained
18   23      1   Nonmaintained
19   27      1   Nonmaintained
20   30      1   Nonmaintained
21   33      1   Nonmaintained
22   43      1 SuperMaintained
23   45      1 SuperMaintained

With the factor() command, I also set the reference level to 'Nonmaintained'.

use survfit()

fit <- survfit(
  Surv(time, status) ~ x,
  data = aml)

surv_pvalue(fit)
    variable        pval   method   pval.txt
           x 0.005309417 Log-rank p = 0.0053

ggsurvplot(
  fit,
  conf.int = FALSE,
  surv.median.line = c('hv'), 
  data = aml, 
  pval = TRUE,
  pval.method = TRUE,
  risk.table = FALSE)

So, the log rank p-value is 0.005309417; Survival is worse for the 'Nonmaintained' stratum.

Untitled

use coxph()

coxfit <- coxph(
  Surv(time, status) ~ x,
  data = aml,
  ties = 'exact')

summary(coxfit)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml, ties = "exact")

  n= 23, number of events= 18 

                     coef exp(coef) se(coef)      z Pr(>|z|)   
xMaintained      -1.12456   0.32479  0.58806 -1.912  0.05584 . 
xSuperMaintained -2.60439   0.07395  0.92232 -2.824  0.00475 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                 exp(coef) exp(-coef) lower .95 upper .95
xMaintained        0.32479      3.079   0.10258    1.0284
xSuperMaintained   0.07395     13.523   0.01213    0.4508

Concordance= 0.729  (se = 0.054 )
Likelihood ratio test= 10.87  on 2 df,   p=0.004
Wald test            = 8.58  on 2 df,   p=0.01
Score (logrank) test = 10.48  on 2 df,   p=0.005

So, the p-values for each stratum are:

  • Maintained vs Nonmaintained, 0.05584
  • SuperMaintained vs Nonmaintained, 0.00475

The hazard ratios (HRs) are given by:

  • exp(coef), HR
  • exp(-coef), HR flipped the other way
  • lower .95, lower 95% CI of the HR
  • upper .95, upper 95% CI of the HR

check that the log rank p-values are the same in both models

The log rank p-value from the Cox model can be accessed by: summary(coxfit)$sctest[3]

round(summary(coxfit)$sctest[3], digits = 9) == round(surv_pvalue(fit)[,2], digits = 9)
pvalue 
  TRUE

Kevin

ADD COMMENTlink modified 9 months ago • written 9 months ago by Kevin Blighe61k

Amazing, thank you so much!!

ADD REPLYlink written 9 months ago by olive121260

You are welcome, Olive.

ADD REPLYlink written 9 months ago by Kevin Blighe61k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour