Question: survfit(Surv()) P-value interpretation for 3 survival curves?
0
gravatar for olive1212
15 days ago by
olive121240
olive121240 wrote:

Using the survfit(Surv()) functions in R, I am generating survival curves for a gene of interest, using z.scores to label patient data as upregulated, downregulated, or "regulated". However, using defaults to generate p-value only returns a single p-value. How should I interpret this p-value in reference to this data? Is there a way to generate a p-value for each dysregulated survival curve compared to the "regulated" survival curve? Code used for reference below. Thanks for your advice!

fit <- survfit(Surv(tcga_data$os_months, tcga_data$os_status) ~ gene_status, data = tcga_data)
ggsurvplot(fit, conf.int = F, surv.median.line = c("hv"), 
                   data = tcga_data, 
                   pval = TRUE, pval.coord = c(300, 1), 
                   risk.table = TRUE)
survival curves surv survfit R • 159 views
ADD COMMENTlink modified 14 days ago by Kevin Blighe50k • written 15 days ago by olive121240
3
gravatar for Kevin Blighe
14 days ago by
Kevin Blighe50k
Kevin Blighe50k wrote:

The default p-value that is calculated by survfit() is the log rank p-value from the score test, which is one of the most oft-quoted p-values for survival data.

If you want to obtain a p-value for each individual stratum compared to the base / reference stratum, then you can use the Cox proportional hazards model, which will produce the same log rank p-value as Survfit() when ties are 'exact':

load AML data

require(survival)
require(survminer)

aml
   time status             x
1     9      1    Maintained
2    13      1    Maintained
3    13      0    Maintained
4    18      1    Maintained
5    23      1    Maintained
6    28      0    Maintained
7    31      1    Maintained
8    34      1    Maintained
9    45      0    Maintained
10   48      1    Maintained
11  161      0    Maintained
12    5      1 Nonmaintained
13    5      1 Nonmaintained
14    8      1 Nonmaintained
15    8      1 Nonmaintained
16   12      1 Nonmaintained
17   16      0 Nonmaintained
18   23      1 Nonmaintained
19   27      1 Nonmaintained
20   30      1 Nonmaintained
21   33      1 Nonmaintained
22   43      1 Nonmaintained
23   45      1 Nonmaintained

let's change the data to add a third stratum, 'SuperMaintained'

aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
aml
   time status               x
1     9      1      Maintained
2    13      1      Maintained
3    13      0      Maintained
4    18      1      Maintained
5    23      1      Maintained
6    28      0      Maintained
7    31      1      Maintained
8    34      1      Maintained
9    45      0      Maintained
10   48      1 SuperMaintained
11  161      0 SuperMaintained
12    5      1   Nonmaintained
13    5      1   Nonmaintained
14    8      1   Nonmaintained
15    8      1   Nonmaintained
16   12      1   Nonmaintained
17   16      0   Nonmaintained
18   23      1   Nonmaintained
19   27      1   Nonmaintained
20   30      1   Nonmaintained
21   33      1   Nonmaintained
22   43      1 SuperMaintained
23   45      1 SuperMaintained

With the factor() command, I also set the reference level to 'Nonmaintained'.

use survfit()

fit <- survfit(
  Surv(time, status) ~ x,
  data = aml)

surv_pvalue(fit)
    variable        pval   method   pval.txt
           x 0.005309417 Log-rank p = 0.0053

ggsurvplot(
  fit,
  conf.int = FALSE,
  surv.median.line = c('hv'), 
  data = aml, 
  pval = TRUE,
  pval.method = TRUE,
  risk.table = FALSE)

So, the log rank p-value is 0.005309417; Survival is worse for the 'Nonmaintained' stratum.

Untitled

use coxph()

coxfit <- coxph(
  Surv(time, status) ~ x,
  data = aml,
  ties = 'exact')

summary(coxfit)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml, ties = "exact")

  n= 23, number of events= 18 

                     coef exp(coef) se(coef)      z Pr(>|z|)   
xMaintained      -1.12456   0.32479  0.58806 -1.912  0.05584 . 
xSuperMaintained -2.60439   0.07395  0.92232 -2.824  0.00475 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                 exp(coef) exp(-coef) lower .95 upper .95
xMaintained        0.32479      3.079   0.10258    1.0284
xSuperMaintained   0.07395     13.523   0.01213    0.4508

Concordance= 0.729  (se = 0.054 )
Likelihood ratio test= 10.87  on 2 df,   p=0.004
Wald test            = 8.58  on 2 df,   p=0.01
Score (logrank) test = 10.48  on 2 df,   p=0.005

So, the p-values for each stratum are:

  • Maintained vs Nonmaintained, 0.05584
  • SuperMaintained vs Nonmaintained, 0.00475

The hazard ratios (HRs) are given by:

  • exp(coef), HR
  • exp(-coef), HR flipped the other way
  • lower .95, lower 95% CI of the HR
  • upper .95, upper 95% CI of the HR

check that the log rank p-values are the same in both models

The log rank p-value from the Cox model can be accessed by: summary(coxfit)$sctest[3]

round(summary(coxfit)$sctest[3], digits = 9) == round(surv_pvalue(fit)[,2], digits = 9)
pvalue 
  TRUE

Kevin

ADD COMMENTlink modified 14 days ago • written 14 days ago by Kevin Blighe50k

Amazing, thank you so much!!

ADD REPLYlink written 12 days ago by olive121240

You are welcome, Olive.

ADD REPLYlink written 12 days ago by Kevin Blighe50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1441 users visited in the last hour