survival analysis error
1
0
Entering edit mode
4.6 years ago
Learner ▴ 250

I have a question regarding this C: Survival analysis of TCGA patients integrating gene expression (RNASeq) data

# run survival analysis

s <- survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum])


This can be calculated without any problem. However, I am wondering why one needs to create the event_rna

s1 <- tryCatch(survdiff(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]), error = function(e) return(NA))


This gives me an empty list. What does it mean? why should it be empty?

# extraect the p.value
pv <- ifelse ( is.na(s1),next,(round(1 - pchisq(s1$chisq, length(s1$n) - 1),3)))[]


I get this error for this command which is because I don't have the s1. So would it not be OK ?

Error in ifelseis.na(s1), next, (round(1 - pchisq(s1$chisq, length(s1$n) -  :
no loop for break/next, jumping to top level


.

# plot the data
plot(survfit(Surv(as.numeric(as.character(all_clin$new_death))[ind_clin],all_clin$death_event[ind_clin])~event_rna[ind_gene,ind_tum]),
col=c(1:3), frame=F, lwd=2,main=paste('KIRK',rownames(z_rna)[ind_gene],sep='\n'))


This gives me three graph. Red, Green and Black. I guess normally it should give black and red, right?

# add lines for the median survival
x1 <- ifelse ( is.na(as.numeric(summary(s)$table[,'median'])),'NA',as.numeric(summary(s)$table[,'median']))
x2 <- as.numeric(summary(s)$table[,'median']) if(x1 != 'NA' &amp; x2 != 'NA'){ &nbsp; lines(c(0,x1),c(0.5,0.5),col='blue') &nbsp; lines(c(x1,x1),c(0,0.5),col='black') &nbsp; lines(c(x2,x2),c(0,0.5),col='red') }  I get this error Error in summary(s)$table[, "median"] : incorrect number of dimensions


Please use below for an example

how i do survival analysis I am trying to give the data in R so that one can explain how to use it

df<- structure(list(sample = structure(c(3L, 3L, 1L, 2L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Down",
"Up", "UP"), class = "factor"), new_time = c(42L, 52L, 99L, 41L,
20L, 6L, 4L, 11L, 47L, 29L, 4L, 8L, 10L, 28L, 16L, 18L, 32L,
26L), new_death = c(71L, 2L, 35L, 58L, 5L, 6L, 44L, 11L, 18L,
33L, 42L, 40L, 53L, 28L, 24L, 18L, 45L, 26L), death_event = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L)), .Names = c("sample", "new_time", "new_death", "death_event"
), class = "data.frame", row.names = c(NA, -18L))

RNA-Seq r • 3.2k views
2
Entering edit mode
4.6 years ago

You have a mixture of 'UP' and 'Up' in your first column. These, together with 'Down', means that 3 categorical variables will be interpreted.

When I correct for that using gsub(), I can generate survival curves and various P values (output to terminal) using km.coxph.plot:

require(survcomp)
require(survival)

df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up"))

km.coxph.plot(formula.s=Surv(new_death, death_event) ~ sample, data.s=df, mark.time=TRUE,
x.label="Time (days)", y.label="Overall survival", main.title="",
leg.text=c("Down", "Up"), leg.pos="topright", leg.bty="n", leg.inset=0,
.col=c("forestgreen","red3"),
xlim=c(0,100),
o.text="",
.lty=c(1,1), .lwd=c(1.75,1.75),
show.n.risk=TRUE, n.risk.step=10, n.risk.cex=0.8, verbose=FALSE) 0
Entering edit mode

@Kevin Blighe thank you for your amazing help. do you think that I need more information than the example data I provided? sometimes I see people make it so fancy in their paper. generally we should make a curve between the time and event. am I right? is the above plot statistically valid? can you please tell me your opinion the graph? it means that down regulation survive better :-D as an example if you look above post , you can see he involved a dummy variable into the model ~event_rna[ind_gene,ind_tum] , why such thing should be incorporated ? do you have any idea?

0
Entering edit mode

Hi, well, if you generate it as a curve, then a reviewer may just tell you to put it back as a step-wise / straight line plot, as I have it above.

I do not believe that you have to make it too fancy. The km.coxph.plot generates a curve based on the Cox Proportional Hazards and therefore generates a p-value and hazard ratio for your categories being plotted (however, you have to choose a reference level via the command that I used above, i.e., df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up")) ). There are other p-values that can be quoted, such as that from the Wald test (not seen these too much in survival analyses) and the log rank.

What I did in a recent manuscript was just added the hazard ratios, their confidence intervals, and the p-values as text to the plot using mtext()

There are many tutorials online about survival analysis. I notice now that there's even an implementation for ggplot2.

Finally, I am not sure that the km.coxph.plot function calculates the no. at risk correctly. I recall using it in the past and re-writing the function to make it work.

0
Entering edit mode

@Kevin Blighe is there a way to incorporate the new-time too ? or I must use only the death time? can you share your paper with me? I can at least refer to that

0
Entering edit mode

Let me check when I get home later today!

0
Entering edit mode

@Kevin Blighe ok thanks. There is another way to also do that dft<- Surv(time, event) then fit <- survfit(dft ~ 1) and plot(fit) . It always makes me wonder which way is best or how to interpret it based on different functions :-D

0
Entering edit mode

Yes, there is 1 billion different ways to do survival in R. You could try this:

require(survcomp)
require(survival)

df<- structure(list(sample = structure(c(3L, 3L, 1L, 2L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Down",
"Up", "UP"), class = "factor"), new_time = c(42L, 52L, 99L, 41L,
20L, 6L, 4L, 11L, 47L, 29L, 4L, 8L, 10L, 28L, 16L, 18L, 32L,
26L), new_death = c(71L, 2L, 35L, 58L, 5L, 6L, 44L, 11L, 18L,
33L, 42L, 40L, 53L, 28L, 24L, 18L, 45L, 26L), death_event = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L)), .Names = c("sample", "new_time", "new_death", "death_event"
), class = "data.frame", row.names = c(NA, -18L))

df$sample <- factor(gsub("UP", "Up", df$sample), levels=c("Down", "Up"))

df.time <- df[,c(1,2,4)]
df.death <- df[,c(1,3,4)]

df.time$sample <- paste(df.time$sample, "Time", sep=", ")
df.death$sample <- paste(df.death$sample, "Death", sep=", ")

colnames(df.time) <- c("sample", "days", "death_event")
colnames(df.death) <- c("sample", "days", "death_event")
df <- rbind(df.death, df.time)

df$sample <- factor(df$sample, levels=c("Down, Death", "Up, Death", "Down, Time", "Up, Time"))


To see the stats, just call the function like this. To do this properly, though, I would generate the p-values from the original and not the merged data object, i.e., separately for new time and new death. The first set of P values listed are from a Cox proportional hazards regression model.

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, x.label="", y.label="", main.title="")
Call:
survival::coxph(formula = formula.s, data = data.s, weights = weight.s)

n= 36, number of events= 26

coef exp(coef) se(coef)      z Pr(>|z|)
sampleUp, Death  -0.1354    0.8734   0.6149 -0.220    0.826
sampleDown, Time -0.4590    0.6319   0.6950 -0.660    0.509
sampleUp, Time    0.5497    1.7327   0.5758  0.955    0.340

exp(coef) exp(-coef) lower .95 upper .95
sampleUp, Death     0.8734     1.1450    0.2617     2.915
sampleDown, Time    0.6319     1.5824    0.1618     2.468
sampleUp, Time      1.7327     0.5771    0.5605     5.356

Concordance= 0.579  (se = 0.071 )
Rsquare= 0.078   (max possible= 0.975 )
Likelihood ratio test= 2.92  on 3 df,   p=0.4037
Wald test            = 3  on 3 df,   p=0.3922
Score (logrank) test = 3.15  on 3 df,   p=0.3696


To generate the plot:

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, mark.time=TRUE,
x.label="Time (days)", y.label="Overall survival", main.title="",
leg.text=c("Down, Death", "Up, Death", "Down, Time", "Up, Time"), leg.pos="topright", leg.bty="n", leg.inset=0,
.col=c("forestgreen", "red3", "forestgreen", "red3"),
xlim=c(0,100),
o.text="",
.lty=c(1,1,2,2), .lwd=c(1.75,1.75,1.75,1.75),
show.n.risk=TRUE, n.risk.step=25, n.risk.cex=0.8, verbose=FALSE)

#High
mtext(side=3, line=-10, adj=0.9, "HR=1.7 (1.06, 2.8), p=0.029", cex=0.9, col="black")
mtext(side=3, line=-12, adj=0.9, "HR=1.5 (0.97, 2.4), p=0.07", cex=0.9, col="black") 0
Entering edit mode

@Kevin Blighe Thank you very much. What are these numbers in the bottom (No. At Risk? ) and what are these HR on the figures ? can you please explian ?

0
Entering edit mode

I would encourage you to read more about the number at risk part (via online tutorials). You can easily remove this by using show.n.risk=TRUE

The 'HR' relates to 'Hazard Ratio', which is derived from the Cox Proportional Hazards model coefficients (see exp(coef), lower .95, and upper .95 in the stats output). In my example, I have not matched the numbers in the plot to the numbers in the output - this is just an example.

There is lots to survival analysis, as you can see! If you are struggling to stay on top of it all, then best to keep it very simple for now and use the more simple functions that you found.

0
Entering edit mode

@Kevin Blighe do you know why I am getting this error ?

Call: + survival::coxph(formula = formula.s, data = data.s, weights = weight.s) Error: object 'Call' not found

0
Entering edit mode

Looks strange. What is the exact command that you are running?

At the end of the day, most of these R survival functions are just wrappers for the basic functions, i.e., Surv and coxph.

0
Entering edit mode

@Kevin Blighe This is what I run exactly

km.coxph.plot(formula.s=Surv(days, death_event) ~ sample, data.s=df, x.label="", y.label="", main.title="")
Call:
survival::coxph(formula = formula.s, data = data.s, weights = weight.s)

0
Entering edit mode

I'm not sure that's an error - a bit strange, in fact. Nothing was output t terminal? If you are running on a cluster, have you got X11 set-up so that plot widows can be displayed?

Load these functions and then re-try:

no.at.risk.kb <- function (formula.s, data.s, sub.s = "all", t.step, t.end)
{
require(survival)
if (length(sub.s) == 1 && sub.s == "all")
sub.s <- rep(TRUE, nrow(data.s))
assign("sub.s", sub.s, envir = .GlobalEnv)
sf <- survfit(formula.s, data = data.s, subset = sub.s)
if (is.null(sf$strata)) sf$strata <- c(All = length(sf$time)) n.strata <- length(sf$strata)
t.pts <- seq(0, t.end, t.step)
sumsf <- summary(sf, times = t.pts, extend = TRUE)
tms <- with(sumsf, split(time, strata))
rsk <- with(sumsf, split(n.risk, strata))
nar <- do.call("rbind", rsk)
nar <- data.frame(names(sf$strata), nar) #New line nar[,2] <- table(data.s[,3]) ################################ colnames(nar) <- c("risk.factor", as.character(tms[])) remove("sub.s", envir = .GlobalEnv) nar } km.coxph.plot <- function (formula.s, data.s, weight.s, x.label, y.label, main.title, sub.title, leg.text, leg.pos = "bottomright", leg.bty = "o", leg.inset = 0.05, o.text, v.line, h.line, .col = 1:4, .lty = 1, .lwd = 1, show.n.risk = FALSE, n.risk.step, n.risk.cex = 0.85, verbose = TRUE, ...) { if (missing(sub.title)) { sub.title <- NULL } if (missing(leg.text)) { leg.text <- NULL } if (missing(weight.s)) { weight.s <- array(1, dim = nrow(data.s), dimnames = list(rownames(data.s))) } data.s <- data.s[!is.na(weight.s) & weight.s > 0, , drop = FALSE] weight.s <- weight.s[!is.na(weight.s) & weight.s > 0] assign("weight.s", weight.s, envir = .GlobalEnv) weighted <- length(sort(unique(weight.s))) > 1 ng <- length(leg.text) old.mar <- par("mar") on.exit(par(mar = old.mar)) .xaxt = "s" .xlab = x.label if (show.n.risk) { par(mar = old.mar + c(ng, 8, 3, 0)) .xaxt = "n" .xlab = "" } plot(survfit(formula.s, data = data.s, weights = weight.s), xaxt = .xaxt, col = .col, lty = .lty, lwd = .lwd, xlab = .xlab, ylab = y.label, ...) title(main.title) if (!missing(v.line) && !is.null(v.line)) { abline(v = v.line, lty = 3, col = "purple") } if (!missing(h.line) && !is.null(h.line)) { abline(h = h.line, lty = 3, col = "purple") } if (!is.null(leg.text)) { legend(x = leg.pos, xjust = 0, yjust = 1, legend = leg.text, col = .col, lty = .lty, lwd = .lwd, cex = 0.9, bg = "white", inset = leg.inset, bty = leg.bty) } if (!is.null(sub.title)) { mtext(sub.title, line = -4, outer = TRUE) } if (missing(o.text)) { sdf <- summary(survival::coxph(formula.s, data = data.s, weights = weight.s)) if (verbose) { print(sdf) } p.val <- sdf$sctest["pvalue"]
o.text <- sprintf("Logrank P = %.1E", p.val)
}
if (is.null(o.text)) {
o.text <- FALSE
}
text(0, 0, o.text, cex = 0.85, pos = 4)
if (show.n.risk) {
usr.xy <- par("usr")
nrisk <- no.at.risk.kb(formula.s = formula.s, data.s = data.s,
sub.s = "all", t.step = n.risk.step, t.end = floor(usr.xy))
at.loc <- seq(0, usr.xy, n.risk.step)
axis(1, at = at.loc)
mtext(x.label, side = 1, line = 2)
mtext("No. At Risk", side = 1, line = 3, at = -0.5 *
n.risk.step, adj = 1, cex = n.risk.cex, font = 2)
for (i in 1:nrow(nrisk)) {
mtext(leg.text[i], side = 1, line = 3 + i, at = -0.5 *
n.risk.step, adj = 1, cex = n.risk.cex)
mtext(nrisk[i, -1], side = 1, at = at.loc, line = 3 +
i, adj = 1, cex = n.risk.cex)
}
}
if (exists("weight.s", envir = .GlobalEnv))
remove("weight.s", envir = .GlobalEnv)
}