survfit(Surv()) P-value interpretation for 3 survival curves?
1
7
Entering edit mode
4.6 years ago
JJDollar ▴ 130

Using the survfit(Surv()) functions in R, I am generating survival curves for a gene of interest, using z.scores to label patient data as upregulated, downregulated, or "regulated". However, using defaults to generate p-value only returns a single p-value. How should I interpret this p-value in reference to this data? Is there a way to generate a p-value for each dysregulated survival curve compared to the "regulated" survival curve? Code used for reference below. Thanks for your advice!

fit <- survfit(Surv(tcga_data$os_months, tcga_data$os_status) ~ gene_status, data = tcga_data)
ggsurvplot(fit, conf.int = F, surv.median.line = c("hv"), 
                   data = tcga_data, 
                   pval = TRUE, pval.coord = c(300, 1), 
                   risk.table = TRUE)
survfit Surv survival curves R • 16k views
ADD COMMENT
12
Entering edit mode
4.6 years ago

The default p-value that is calculated by survfit() is the log rank p-value from the score test, which is one of the most oft-quoted p-values for survival data.

If you want to obtain a p-value for each individual stratum compared to the base / reference stratum, then you can use the Cox proportional hazards model, which will produce the same log rank p-value as Survfit() when ties are 'exact':

load AML data

require(survival)
require(survminer)

aml
   time status             x
1     9      1    Maintained
2    13      1    Maintained
3    13      0    Maintained
4    18      1    Maintained
5    23      1    Maintained
6    28      0    Maintained
7    31      1    Maintained
8    34      1    Maintained
9    45      0    Maintained
10   48      1    Maintained
11  161      0    Maintained
12    5      1 Nonmaintained
13    5      1 Nonmaintained
14    8      1 Nonmaintained
15    8      1 Nonmaintained
16   12      1 Nonmaintained
17   16      0 Nonmaintained
18   23      1 Nonmaintained
19   27      1 Nonmaintained
20   30      1 Nonmaintained
21   33      1 Nonmaintained
22   43      1 Nonmaintained
23   45      1 Nonmaintained

let's change the data to add a third stratum, 'SuperMaintained'

aml$x <- as.character(aml$x)
aml[10,3] <- 'SuperMaintained'
aml[11,3] <- 'SuperMaintained'
aml[22,3] <- 'SuperMaintained'
aml[23,3] <- 'SuperMaintained'
aml$x <- factor(aml$x, levels = c('Nonmaintained','Maintained','SuperMaintained'))
aml
   time status               x
1     9      1      Maintained
2    13      1      Maintained
3    13      0      Maintained
4    18      1      Maintained
5    23      1      Maintained
6    28      0      Maintained
7    31      1      Maintained
8    34      1      Maintained
9    45      0      Maintained
10   48      1 SuperMaintained
11  161      0 SuperMaintained
12    5      1   Nonmaintained
13    5      1   Nonmaintained
14    8      1   Nonmaintained
15    8      1   Nonmaintained
16   12      1   Nonmaintained
17   16      0   Nonmaintained
18   23      1   Nonmaintained
19   27      1   Nonmaintained
20   30      1   Nonmaintained
21   33      1   Nonmaintained
22   43      1 SuperMaintained
23   45      1 SuperMaintained

With the factor() command, I also set the reference level to 'Nonmaintained'.

use survfit()

fit <- survfit(
  Surv(time, status) ~ x,
  data = aml)

surv_pvalue(fit)
    variable        pval   method   pval.txt
           x 0.005309417 Log-rank p = 0.0053

ggsurvplot(
  fit,
  conf.int = FALSE,
  surv.median.line = c('hv'), 
  data = aml, 
  pval = TRUE,
  pval.method = TRUE,
  risk.table = FALSE)

So, the log rank p-value is 0.005309417; Survival is worse for the 'Nonmaintained' stratum.

Untitled

use coxph()

coxfit <- coxph(
  Surv(time, status) ~ x,
  data = aml,
  ties = 'exact')

summary(coxfit)
Call:
coxph(formula = Surv(time, status) ~ x, data = aml, ties = "exact")

  n= 23, number of events= 18 

                     coef exp(coef) se(coef)      z Pr(>|z|)   
xMaintained      -1.12456   0.32479  0.58806 -1.912  0.05584 . 
xSuperMaintained -2.60439   0.07395  0.92232 -2.824  0.00475 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                 exp(coef) exp(-coef) lower .95 upper .95
xMaintained        0.32479      3.079   0.10258    1.0284
xSuperMaintained   0.07395     13.523   0.01213    0.4508

Concordance= 0.729  (se = 0.054 )
Likelihood ratio test= 10.87  on 2 df,   p=0.004
Wald test            = 8.58  on 2 df,   p=0.01
Score (logrank) test = 10.48  on 2 df,   p=0.005

So, the p-values for each stratum are:

  • Maintained vs Nonmaintained, 0.05584
  • SuperMaintained vs Nonmaintained, 0.00475

The hazard ratios (HRs) are given by:

  • exp(coef), HR
  • exp(-coef), HR flipped the other way
  • lower .95, lower 95% CI of the HR
  • upper .95, upper 95% CI of the HR

check that the log rank p-values are the same in both models

The log rank p-value from the Cox model can be accessed by: summary(coxfit)$sctest[3]

round(summary(coxfit)$sctest[3], digits = 9) == round(surv_pvalue(fit)[,2], digits = 9)
pvalue 
  TRUE

Kevin

ADD COMMENT
1
Entering edit mode

Amazing, thank you so much!!

ADD REPLY
0
Entering edit mode

You are welcome, Olive.

ADD REPLY
1
Entering edit mode

This just helped me too - thank you

ADD REPLY
0
Entering edit mode

The log rank p value of 0.0053 in the graph above take into consideration all the 3 groups(Maintained, Nonmaintained, SuperMaintained)? What does one p value for 3 groups tell? I know later on you showed pvalues between 2 groups with reference as Nonmaintained.

ADD REPLY
0
Entering edit mode

Hi Kevin,

This is a very nice illustration. I'm working on a dataset comparing the survival of two groups of samples, the logrank pval generated by the survfit is 0.10 while the logrank pval genearated by the Cox model is 0.15. Is it something can make sense or it is something that I did wrong?

ADD REPLY

Login before adding your answer.

Traffic: 2865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6