Question: Cox Hazard In R, should pass treatments to the model as a factor?
gravatar for Baylie_321
19 months ago by
Baylie_32130 wrote:


I am looking at using Cox Hazard in R using survivor, I have either (mo) 1=died, 0=censored, time in hours (hour), the temperature treatment (temp_c, 3 groups), condition factor (CF, continuous var).

I am confused about handling the temperature group variable and have trialled 2 methods:

data <- read.delim("coxnew.txt", header=TRUE)
data$SurvObj <- with(data, Surv(hour, mo == 1))

(i) I have seen that in example models online, sex: m=0, f=1 or similar, so I used L=1,M=2,H=3 as the groups.

mod <- coxph(SurvObj ~ temp_c + CF , data = data)

and this gave me a summary with two outputs, one for temp_c and one for CF.

(ii) I have also made temperature treatment a factor

data$temp_c <- factor(data$temp_c)
data$temp_c<- relevel(data$temp_c, ref="H")
mod <- coxph(SurvObj ~ temp_c +CF , data = data)

and this gave me a summary with three outputs, one for temp_cL, one for temp_cM and one for CF

I am not sure which is the correct to use, as (ii) requires that you then input one group as a reference? The confusion comes into play when I try and see what temperature plots like, when CF is held at a mean value, as the output graphs look different for the two different methods?


temp_new <- with(data, data.frame(temp_c= c(1,2,3), CF = rep(mean(CF, na.rm = TRUE), 3)))

or (ii)

temp_new <- with(data, data.frame(temp_c= c("L","M","H"), CF = rep(mean(CF, na.rm = TRUE), 3)))

Does it matter which I use- is it personal preference - or does one version make more statistical sense? I was inclined to go with the first (i) as this compared to the m=0, f=1 style and I assume uses a comparison of three groups among themselves and not just two groups compared to the reference level assigned?

Many thanks, Bekah

cox hazard survivor R • 444 views
ADD COMMENTlink modified 19 months ago by zx87549.3k • written 19 months ago by Baylie_32130

Cheers! Okay, hopefully I am correct in interpreting your answer as: If its set as

data$SurvObj <- with(data, Surv(hour, mo == 1))

A hazard ratio of 0.7 for temperature grouping as simply 1,2,3 would be with increasing group number, chance of death (1) over censoring (0) decreases from 1 --> 2 --> 3 temperature treatment.

A hazard ratio of L (5) and M (3) for temperature grouping as factors, with ref level as H would be: Increase in chance of death (1) over censoring (0) for both L and M when compared to H, but chance with temperature L is higher?

Best wishes, Bekah

ADD REPLYlink written 19 months ago by Baylie_32130

Yes, 0.7 indicates that, with increasing temp_c value, HR is reduced, when adjusted for your condition factor (CF). I am not sure of the exact interpretation of having temperature encoded as a continuous variable of 1, 2, 3 - it would make more sense to be categorical. A continuous temperature variable makes more sense as Kelvin values, or, granted, Celsius.

The other values for L and M are readily interpreted. Considering that you set H as the reference level, it says that the low temperature group has the highest hazard of death, when adjusted for CF. The medium temperature group also has a higher hazard of death (i.e., higher hazard when compared to high temperature group).

You should also be looking at the upper and lower confidence intervals (CIs), and the Log Rank p-value. For example, a general rule of thumb: if we have a HR=0.7 but it's upper CI passes 1.0, then that is less reliable and this will reflect in the p-value.

ADD REPLYlink modified 19 months ago • written 19 months ago by Kevin Blighe61k

Thank you so much for all your help! :) this is much clearer now!

ADD REPLYlink written 19 months ago by Baylie_32130
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

The difference will be played out when your temp_new object is encoded as either a categorical or a continuous variable.

For example:

temp_c <- factor(c("L","M","H"), levels=c("L","M","H")) the same as:

temp_c <- factor(c(1,2,3), levels=c(1,2,3))

When you run a model with either of these, a coefficient will be calculated for each level compared to the reference level. All statistical values that are returned will be the same for L M H as per 1 2 3. Thus, in the context of the broader model terms, M and H are separately compared to L, the reference level, while, in the other case, 2 and 3 are compared to 1.

However, these are both different from just encoding the variable as temp_c <- c(1,2,3), a continuous variable, in which case only a single coefficient will be calculated for temp_c.

So, both are statistically valid, but the interpretation is different.


ADD COMMENTlink written 19 months ago by Kevin Blighe61k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1693 users visited in the last hour