run survival analysis

Question

survival analysis error

0

Entering edit mode

6.2 years ago

Learner ▴ 280

I have a question regarding this C: Survival analysis of TCGA patients integrating gene expression (RNASeq) data

run survival analysis

s <- survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum])

This can be calculated without any problem. However, I am wondering why one needs to create the event_rna

s1 <- tryCatch(survdiff(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]), error = function(e) return(NA))

This gives me an empty list. What does it mean? why should it be empty?

# extraect the p.value
pv <- ifelse ( is.na(s1),next,(round(1 - pchisq(s1$chisq, length(s1$n) - 1),3)))[[1]]

I get this error for this command which is because I don't have the s1. So would it not be OK ?

Error in ifelseis.na(s1), next, (round(1 - pchisq(s1$chisq, length(s1$n) -  : 
  no loop for break/next, jumping to top level

.

# plot the data
plot(survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]),
     col=c(1:3), frame=F, lwd=2,main=paste('KIRK',rownames(z_rna)[ind_gene],sep='\n'))

This gives me three graph. Red, Green and Black. I guess normally it should give black and red, right?

# add lines for the median survival
x1 <- ifelse ( is.na(as.numeric(summary(s)$table[,'median'][1])),'NA',as.numeric(summary(s)$table[,'median'][1]))
x2 <- as.numeric(summary(s)$table[,'median'][2])
if(x1 != 'NA' &amp; x2 != 'NA'){
&nbsp; lines(c(0,x1),c(0.5,0.5),col='blue')
&nbsp; lines(c(x1,x1),c(0,0.5),col='black')
&nbsp; lines(c(x2,x2),c(0,0.5),col='red')
}

I get this error

Error in summary(s)$table[, "median"] : incorrect number of dimensions

Please use below for an example

how i do survival analysis I am trying to give the data in R so that one can explain how to use it

df<- structure(list(sample = structure(c(3L, 3L, 1L, 2L, 3L, 3L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Down", 
"Up", "UP"), class = "factor"), new_time = c(42L, 52L, 99L, 41L, 
20L, 6L, 4L, 11L, 47L, 29L, 4L, 8L, 10L, 28L, 16L, 18L, 32L, 
26L), new_death = c(71L, 2L, 35L, 58L, 5L, 6L, 44L, 11L, 18L, 
33L, 42L, 40L, 53L, 28L, 24L, 18L, 45L, 26L), death_event = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
0L)), .Names = c("sample", "new_time", "new_death", "death_event"
), class = "data.frame", row.names = c(NA, -18L))

RNA-Seq r • 4.1k views

ADD COMMENT • link updated 6.2 years ago by Kevin Blighe 87k • written 6.2 years ago by Learner ▴ 280

Kevin Blighe · Accepted Answer · 2018-02-07

2

Entering edit mode

6.2 years ago

Kevin Blighe 87k

You have a mixture of 'UP' and 'Up' in your first column. These, together with 'Down', means that 3 categorical variables will be interpreted.

When I correct for that using gsub(), I can generate survival curves and various P values (output to terminal) using km.coxph.plot:

require(survcomp)
require(survival)

df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up"))

km.coxph.plot(formula.s=Surv(new_death, death_event) ~ sample, data.s=df, mark.time=TRUE,
        x.label="Time (days)", y.label="Overall survival", main.title="",
        leg.text=c("Down", "Up"), leg.pos="topright", leg.bty="n", leg.inset=0,
        .col=c("forestgreen","red3"),
        xlim=c(0,100),
        o.text="",
        .lty=c(1,1), .lwd=c(1.75,1.75),
        show.n.risk=TRUE, n.risk.step=10, n.risk.cex=0.8, verbose=FALSE)

ADD COMMENT • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe thank you for your amazing help. do you think that I need more information than the example data I provided? sometimes I see people make it so fancy in their paper. generally we should make a curve between the time and event. am I right? is the above plot statistically valid? can you please tell me your opinion the graph? it means that down regulation survive better :-D as an example if you look above post , you can see he involved a dummy variable into the model ~event_rna[ind_gene,ind_tum] , why such thing should be incorporated ? do you have any idea?

ADD REPLY • link 6.2 years ago by Learner ▴ 280

0

Entering edit mode

Hi, well, if you generate it as a curve, then a reviewer may just tell you to put it back as a step-wise / straight line plot, as I have it above.

I do not believe that you have to make it too fancy. The km.coxph.plot generates a curve based on the Cox Proportional Hazards and therefore generates a p-value and hazard ratio for your categories being plotted (however, you have to choose a reference level via the command that I used above, i.e., df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up")) ). There are other p-values that can be quoted, such as that from the Wald test (not seen these too much in survival analyses) and the log rank.

What I did in a recent manuscript was just added the hazard ratios, their confidence intervals, and the p-values as text to the plot using mtext()

There are many tutorials online about survival analysis. I notice now that there's even an implementation for ggplot2.

Finally, I am not sure that the km.coxph.plot function calculates the no. at risk correctly. I recall using it in the past and re-writing the function to make it work.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe is there a way to incorporate the new-time too ? or I must use only the death time? can you share your paper with me? I can at least refer to that

ADD REPLY • link 6.2 years ago by Learner ▴ 280

0

Entering edit mode

Let me check when I get home later today!

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe ok thanks. There is another way to also do that dft<- Surv(time, event) then fit <- survfit(dft ~ 1) and plot(fit) . It always makes me wonder which way is best or how to interpret it based on different functions :-D

ADD REPLY • link 6.2 years ago by Learner ▴ 280

0

Entering edit mode

Yes, there is 1 billion different ways to do survival in R. You could try this:

require(survcomp)
require(survival)

df<- structure(list(sample = structure(c(3L, 3L, 1L, 2L, 3L, 3L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Down", 
"Up", "UP"), class = "factor"), new_time = c(42L, 52L, 99L, 41L, 
20L, 6L, 4L, 11L, 47L, 29L, 4L, 8L, 10L, 28L, 16L, 18L, 32L, 
26L), new_death = c(71L, 2L, 35L, 58L, 5L, 6L, 44L, 11L, 18L, 
33L, 42L, 40L, 53L, 28L, 24L, 18L, 45L, 26L), death_event = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
0L)), .Names = c("sample", "new_time", "new_death", "death_event"
), class = "data.frame", row.names = c(NA, -18L))

df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up"))

df.time <- df[,c(1,2,4)]
df.death <- df[,c(1,3,4)]

df.time$sample <- paste(df.time$sample, "Time", sep=", ")
df.death$sample <- paste(df.death$sample, "Death", sep=", ")

colnames(df.time) <- c("sample", "days", "death_event")
colnames(df.death) <- c("sample", "days", "death_event")
df <- rbind(df.death, df.time)

df$sample <- factor(df$sample, levels=c("Down, Death", "Up, Death", "Down, Time", "Up, Time"))

To see the stats, just call the function like this. To do this properly, though, I would generate the p-values from the original and not the merged data object, i.e., separately for new time and new death. The first set of P values listed are from a Cox proportional hazards regression model.

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, x.label="", y.label="", main.title="")
Call:
survival::coxph(formula = formula.s, data = data.s, weights = weight.s)

  n= 36, number of events= 26 

                    coef exp(coef) se(coef)      z Pr(>|z|)
sampleUp, Death  -0.1354    0.8734   0.6149 -0.220    0.826
sampleDown, Time -0.4590    0.6319   0.6950 -0.660    0.509
sampleUp, Time    0.5497    1.7327   0.5758  0.955    0.340

                 exp(coef) exp(-coef) lower .95 upper .95
sampleUp, Death     0.8734     1.1450    0.2617     2.915
sampleDown, Time    0.6319     1.5824    0.1618     2.468
sampleUp, Time      1.7327     0.5771    0.5605     5.356

Concordance= 0.579  (se = 0.071 )
Rsquare= 0.078   (max possible= 0.975 )
Likelihood ratio test= 2.92  on 3 df,   p=0.4037
Wald test            = 3  on 3 df,   p=0.3922
Score (logrank) test = 3.15  on 3 df,   p=0.3696

To generate the plot:

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, mark.time=TRUE,
    x.label="Time (days)", y.label="Overall survival", main.title="",
    leg.text=c("Down, Death", "Up, Death", "Down, Time", "Up, Time"), leg.pos="topright", leg.bty="n", leg.inset=0,
    .col=c("forestgreen", "red3", "forestgreen", "red3"),
    xlim=c(0,100),
    o.text="",
    .lty=c(1,1,2,2), .lwd=c(1.75,1.75,1.75,1.75),
    show.n.risk=TRUE, n.risk.step=25, n.risk.cex=0.8, verbose=FALSE)

    #High
    mtext(side=3, line=-10, adj=0.9, "HR=1.7 (1.06, 2.8), p=0.029", cex=0.9, col="black")
    mtext(side=3, line=-12, adj=0.9, "HR=1.5 (0.97, 2.4), p=0.07", cex=0.9, col="black")

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe Thank you very much. What are these numbers in the bottom (No. At Risk? ) and what are these HR on the figures ? can you please explian ?

ADD REPLY • link 6.2 years ago by Learner ▴ 280

0

Entering edit mode

I would encourage you to read more about the number at risk part (via online tutorials). You can easily remove this by using show.n.risk=TRUE

The 'HR' relates to 'Hazard Ratio', which is derived from the Cox Proportional Hazards model coefficients (see exp(coef), lower .95, and upper .95 in the stats output). In my example, I have not matched the numbers in the plot to the numbers in the output - this is just an example.

There is lots to survival analysis, as you can see! If you are struggling to stay on top of it all, then best to keep it very simple for now and use the more simple functions that you found.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe do you know why I am getting this error ?

Call: + survival::coxph(formula = formula.s, data = data.s, weights = weight.s) Error: object 'Call' not found

ADD REPLY • link 6.2 years ago by Learner ▴ 280

0

Entering edit mode

Looks strange. What is the exact command that you are running?

At the end of the day, most of these R survival functions are just wrappers for the basic functions, i.e., Surv and coxph.

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

0

Entering edit mode

@Kevin Blighe This is what I run exactly

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, x.label="", y.label="", main.title="")
Call:
survival::coxph(formula = formula.s, data = data.s, weights = weight.s)

ADD REPLY • link updated 6.2 years ago by Kevin Blighe 87k • written 6.2 years ago by Learner ▴ 280

0

Entering edit mode

I'm not sure that's an error - a bit strange, in fact. Nothing was output t terminal? If you are running on a cluster, have you got X11 set-up so that plot widows can be displayed?

Load these functions and then re-try:

no.at.risk.kb <- function (formula.s, data.s, sub.s = "all", t.step, t.end) 
{
    require(survival)
    if (length(sub.s) == 1 && sub.s == "all") 
        sub.s <- rep(TRUE, nrow(data.s))
    assign("sub.s", sub.s, envir = .GlobalEnv)
    sf <- survfit(formula.s, data = data.s, subset = sub.s)
    if (is.null(sf$strata)) 
        sf$strata <- c(All = length(sf$time))
    n.strata <- length(sf$strata)
    t.pts <- seq(0, t.end, t.step)
    sumsf <- summary(sf, times = t.pts, extend = TRUE)
    tms <- with(sumsf, split(time, strata))
    rsk <- with(sumsf, split(n.risk, strata))
    nar <- do.call("rbind", rsk)
    nar <- data.frame(names(sf$strata), nar)

    #New line
        nar[,2] <- table(data.s[,3])
    ################################

    colnames(nar) <- c("risk.factor", as.character(tms[[1]]))
    remove("sub.s", envir = .GlobalEnv)
    nar
}
km.coxph.plot <- function (formula.s, data.s, weight.s, x.label, y.label, main.title, 
    sub.title, leg.text, leg.pos = "bottomright", leg.bty = "o", 
    leg.inset = 0.05, o.text, v.line, h.line, .col = 1:4, .lty = 1, 
    .lwd = 1, show.n.risk = FALSE, n.risk.step, n.risk.cex = 0.85, 
    verbose = TRUE, ...) 
{
    if (missing(sub.title)) {
        sub.title <- NULL
    }
    if (missing(leg.text)) {
        leg.text <- NULL
    }
    if (missing(weight.s)) {
        weight.s <- array(1, dim = nrow(data.s), dimnames = list(rownames(data.s)))
    }
    data.s <- data.s[!is.na(weight.s) & weight.s > 0, , drop = FALSE]
    weight.s <- weight.s[!is.na(weight.s) & weight.s > 0]
    assign("weight.s", weight.s, envir = .GlobalEnv)
    weighted <- length(sort(unique(weight.s))) > 1
    ng <- length(leg.text)
    old.mar <- par("mar")
    on.exit(par(mar = old.mar))
    .xaxt = "s"
    .xlab = x.label
    if (show.n.risk) {
        par(mar = old.mar + c(ng, 8, 3, 0))
        .xaxt = "n"
        .xlab = ""
    }
    plot(survfit(formula.s, data = data.s, weights = weight.s), 
        xaxt = .xaxt, col = .col, lty = .lty, lwd = .lwd, xlab = .xlab, 
        ylab = y.label, ...)
    title(main.title)
    if (!missing(v.line) && !is.null(v.line)) {
        abline(v = v.line, lty = 3, col = "purple")
    }
    if (!missing(h.line) && !is.null(h.line)) {
        abline(h = h.line, lty = 3, col = "purple")
    }
    if (!is.null(leg.text)) {
        legend(x = leg.pos, xjust = 0, yjust = 1, legend = leg.text, 
            col = .col, lty = .lty, lwd = .lwd, cex = 0.9, bg = "white", 
            inset = leg.inset, bty = leg.bty)
    }
    if (!is.null(sub.title)) {
        mtext(sub.title, line = -4, outer = TRUE)
    }
    if (missing(o.text)) {
        sdf <- summary(survival::coxph(formula.s, data = data.s, 
            weights = weight.s))
        if (verbose) {
            print(sdf)
        }
        p.val <- sdf$sctest["pvalue"]
        o.text <- sprintf("Logrank P = %.1E", p.val)
    }
    if (is.null(o.text)) {
        o.text <- FALSE
    }
    text(0, 0, o.text, cex = 0.85, pos = 4)
    if (show.n.risk) {
        usr.xy <- par("usr")
        nrisk <- no.at.risk.kb(formula.s = formula.s, data.s = data.s, 
            sub.s = "all", t.step = n.risk.step, t.end = floor(usr.xy[2]))
        at.loc <- seq(0, usr.xy[2], n.risk.step)
        axis(1, at = at.loc)
        mtext(x.label, side = 1, line = 2)
        mtext("No. At Risk", side = 1, line = 3, at = -0.5 * 
            n.risk.step, adj = 1, cex = n.risk.cex, font = 2)
        for (i in 1:nrow(nrisk)) {
            mtext(leg.text[i], side = 1, line = 3 + i, at = -0.5 * 
                n.risk.step, adj = 1, cex = n.risk.cex)
            mtext(nrisk[i, -1], side = 1, at = at.loc, line = 3 + 
                i, adj = 1, cex = n.risk.cex)
        }
    }
    if (exists("weight.s", envir = .GlobalEnv)) 
        remove("weight.s", envir = .GlobalEnv)
}

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k