Question

Cox Hazard In R, should pass treatments to the model as a factor?

1

Entering edit mode

6.6 years ago

Baylie_321 ▴ 30

Hiya,

I am looking at using Cox Hazard in R using survivor, I have either (mo) 1=died, 0=censored, time in hours (hour), the temperature treatment (temp_c, 3 groups), condition factor (CF, continuous var).

I am confused about handling the temperature group variable and have trialled 2 methods:

data <- read.delim("coxnew.txt", header=TRUE)
data$SurvObj <- with(data, Surv(hour, mo == 1))

(i) I have seen that in example models online, sex: m=0, f=1 or similar, so I used L=1,M=2,H=3 as the groups.

mod <- coxph(SurvObj ~ temp_c + CF , data = data)

and this gave me a summary with two outputs, one for temp_c and one for CF.

(ii) I have also made temperature treatment a factor

data$temp_c <- factor(data$temp_c)
data$temp_c<- relevel(data$temp_c, ref="H")
mod <- coxph(SurvObj ~ temp_c +CF , data = data)

and this gave me a summary with three outputs, one for temp_cL, one for temp_cM and one for CF

I am not sure which is the correct to use, as (ii) requires that you then input one group as a reference? The confusion comes into play when I try and see what temperature plots like, when CF is held at a mean value, as the output graphs look different for the two different methods?

(i)

temp_new <- with(data, data.frame(temp_c= c(1,2,3), CF = rep(mean(CF, na.rm = TRUE), 3)))

or (ii)

temp_new <- with(data, data.frame(temp_c= c("L","M","H"), CF = rep(mean(CF, na.rm = TRUE), 3)))

Does it matter which I use- is it personal preference - or does one version make more statistical sense? I was inclined to go with the first (i) as this compared to the m=0, f=1 style and I assume uses a comparison of three groups among themselves and not just two groups compared to the reference level assigned?

Many thanks, Bekah

Survivor Cox hazard R • 1.6k views

ADD COMMENT • link updated 6.6 years ago by zx8754 12k • written 6.6 years ago by Baylie_321 ▴ 30

0

Entering edit mode

Cheers! Okay, hopefully I am correct in interpreting your answer as: If its set as

data$SurvObj <- with(data, Surv(hour, mo == 1))

A hazard ratio of 0.7 for temperature grouping as simply 1,2,3 would be with increasing group number, chance of death (1) over censoring (0) decreases from 1 --> 2 --> 3 temperature treatment.

A hazard ratio of L (5) and M (3) for temperature grouping as factors, with ref level as H would be: Increase in chance of death (1) over censoring (0) for both L and M when compared to H, but chance with temperature L is higher?

Best wishes, Bekah

ADD REPLY • link 6.6 years ago by Baylie_321 ▴ 30

0

Entering edit mode

Yes, 0.7 indicates that, with increasing temp_c value, HR is reduced, when adjusted for your condition factor (CF). I am not sure of the exact interpretation of having temperature encoded as a continuous variable of 1, 2, 3 - it would make more sense to be categorical. A continuous temperature variable makes more sense as Kelvin values, or, granted, Celsius.

The other values for L and M are readily interpreted. Considering that you set H as the reference level, it says that the low temperature group has the highest hazard of death, when adjusted for CF. The medium temperature group also has a higher hazard of death (i.e., higher hazard when compared to high temperature group).

You should also be looking at the upper and lower confidence intervals (CIs), and the Log Rank p-value. For example, a general rule of thumb: if we have a HR=0.7 but it's upper CI passes 1.0, then that is less reliable and this will reflect in the p-value.

ADD REPLY • link 6.6 years ago by Kevin Blighe 89k

1

Entering edit mode

Thank you so much for all your help! :) this is much clearer now!

ADD REPLY • link 6.6 years ago by Baylie_321 ▴ 30

score 1 · Answer 1 · 2018-12-02

The difference will be played out when your temp_new object is encoded as either a categorical or a continuous variable.

For example:

temp_c <- factor(c("L","M","H"), levels=c("L","M","H"))

..is the same as:

temp_c <- factor(c(1,2,3), levels=c(1,2,3))

When you run a model with either of these, a coefficient will be calculated for each level compared to the reference level. All statistical values that are returned will be the same for L M H as per 1 2 3. Thus, in the context of the broader model terms, M and H are separately compared to L, the reference level, while, in the other case, 2 and 3 are compared to 1.

However, these are both different from just encoding the variable as temp_c <- c(1,2,3), a continuous variable, in which case only a single coefficient will be calculated for temp_c.

So, both are statistically valid, but the interpretation is different.

Kevin