Question: Cox Hazard In R, should pass treatments to the model as a factor?
1
Baylie_32130 wrote:

Hiya,

I am looking at using Cox Hazard in R using survivor, I have either (mo) 1=died, 0=censored, time in hours (hour), the temperature treatment (temp_c, 3 groups), condition factor (CF, continuous var).

I am confused about handling the temperature group variable and have trialled 2 methods:

``````data <- read.delim("coxnew.txt", header=TRUE)
data\$SurvObj <- with(data, Surv(hour, mo == 1))
``````

(i) I have seen that in example models online, sex: m=0, f=1 or similar, so I used L=1,M=2,H=3 as the groups.

``````mod <- coxph(SurvObj ~ temp_c + CF , data = data)
``````

and this gave me a summary with two outputs, one for temp_c and one for CF.

(ii) I have also made temperature treatment a factor

``````data\$temp_c <- factor(data\$temp_c)
data\$temp_c<- relevel(data\$temp_c, ref="H")
mod <- coxph(SurvObj ~ temp_c +CF , data = data)
``````

and this gave me a summary with three outputs, one for temp_cL, one for temp_cM and one for CF

I am not sure which is the correct to use, as (ii) requires that you then input one group as a reference? The confusion comes into play when I try and see what temperature plots like, when CF is held at a mean value, as the output graphs look different for the two different methods?

(i)

``````temp_new <- with(data, data.frame(temp_c= c(1,2,3), CF = rep(mean(CF, na.rm = TRUE), 3)))
``````

or (ii)

``````temp_new <- with(data, data.frame(temp_c= c("L","M","H"), CF = rep(mean(CF, na.rm = TRUE), 3)))
``````

Does it matter which I use- is it personal preference - or does one version make more statistical sense? I was inclined to go with the first (i) as this compared to the m=0, f=1 style and I assume uses a comparison of three groups among themselves and not just two groups compared to the reference level assigned?

Many thanks, Bekah

cox hazard survivor R • 444 views
modified 19 months ago by zx87549.3k • written 19 months ago by Baylie_32130

Cheers! Okay, hopefully I am correct in interpreting your answer as: If its set as

data\$SurvObj <- with(data, Surv(hour, mo == 1))

A hazard ratio of 0.7 for temperature grouping as simply 1,2,3 would be with increasing group number, chance of death (1) over censoring (0) decreases from 1 --> 2 --> 3 temperature treatment.

A hazard ratio of L (5) and M (3) for temperature grouping as factors, with ref level as H would be: Increase in chance of death (1) over censoring (0) for both L and M when compared to H, but chance with temperature L is higher?

Best wishes, Bekah

Yes, 0.7 indicates that, with increasing `temp_c` value, HR is reduced, when adjusted for your condition factor (CF). I am not sure of the exact interpretation of having temperature encoded as a continuous variable of 1, 2, 3 - it would make more sense to be categorical. A continuous temperature variable makes more sense as Kelvin values, or, granted, Celsius.

The other values for `L` and `M` are readily interpreted. Considering that you set `H` as the reference level, it says that the low temperature group has the highest hazard of death, when adjusted for `CF`. The medium temperature group also has a higher hazard of death (i.e., higher hazard when compared to high temperature group).

You should also be looking at the upper and lower confidence intervals (CIs), and the Log Rank p-value. For example, a general rule of thumb: if we have a HR=0.7 but it's upper CI passes 1.0, then that is less reliable and this will reflect in the p-value.

1

Thank you so much for all your help! :) this is much clearer now!

1
Kevin Blighe61k wrote:

The difference will be played out when your `temp_new` object is encoded as either a categorical or a continuous variable.

For example:

``````temp_c <- factor(c("L","M","H"), levels=c("L","M","H"))
``````

..is the same as:

``````temp_c <- factor(c(1,2,3), levels=c(1,2,3))
``````

When you run a model with either of these, a coefficient will be calculated for each level compared to the reference level. All statistical values that are returned will be the same for `L M H` as per `1 2 3`. Thus, in the context of the broader model terms, `M` and `H` are separately compared to `L`, the reference level, while, in the other case, `2` and `3` are compared to `1`.

However, these are both different from just encoding the variable as `temp_c <- c(1,2,3)`, a continuous variable, in which case only a single coefficient will be calculated for `temp_c`.

So, both are statistically valid, but the interpretation is different.

Kevin