Cox Hazard In R, should pass treatments to the model as a factor?
1
1
Entering edit mode
5.4 years ago
Baylie_321 ▴ 30

Hiya,

I am looking at using Cox Hazard in R using survivor, I have either (mo) 1=died, 0=censored, time in hours (hour), the temperature treatment (temp_c, 3 groups), condition factor (CF, continuous var).

I am confused about handling the temperature group variable and have trialled 2 methods:

data <- read.delim("coxnew.txt", header=TRUE)
data$SurvObj <- with(data, Surv(hour, mo == 1))

(i) I have seen that in example models online, sex: m=0, f=1 or similar, so I used L=1,M=2,H=3 as the groups.

mod <- coxph(SurvObj ~ temp_c + CF , data = data)

and this gave me a summary with two outputs, one for temp_c and one for CF.

(ii) I have also made temperature treatment a factor

data$temp_c <- factor(data$temp_c)
data$temp_c<- relevel(data$temp_c, ref="H")
mod <- coxph(SurvObj ~ temp_c +CF , data = data)

and this gave me a summary with three outputs, one for temp_cL, one for temp_cM and one for CF

I am not sure which is the correct to use, as (ii) requires that you then input one group as a reference? The confusion comes into play when I try and see what temperature plots like, when CF is held at a mean value, as the output graphs look different for the two different methods?

(i)

temp_new <- with(data, data.frame(temp_c= c(1,2,3), CF = rep(mean(CF, na.rm = TRUE), 3)))

or (ii)

temp_new <- with(data, data.frame(temp_c= c("L","M","H"), CF = rep(mean(CF, na.rm = TRUE), 3)))

Does it matter which I use- is it personal preference - or does one version make more statistical sense? I was inclined to go with the first (i) as this compared to the m=0, f=1 style and I assume uses a comparison of three groups among themselves and not just two groups compared to the reference level assigned?

Many thanks, Bekah

Survivor Cox hazard R • 1.2k views
ADD COMMENT
0
Entering edit mode

Cheers! Okay, hopefully I am correct in interpreting your answer as: If its set as

data$SurvObj <- with(data, Surv(hour, mo == 1))

A hazard ratio of 0.7 for temperature grouping as simply 1,2,3 would be with increasing group number, chance of death (1) over censoring (0) decreases from 1 --> 2 --> 3 temperature treatment.

A hazard ratio of L (5) and M (3) for temperature grouping as factors, with ref level as H would be: Increase in chance of death (1) over censoring (0) for both L and M when compared to H, but chance with temperature L is higher?

Best wishes, Bekah

ADD REPLY
0
Entering edit mode

Yes, 0.7 indicates that, with increasing temp_c value, HR is reduced, when adjusted for your condition factor (CF). I am not sure of the exact interpretation of having temperature encoded as a continuous variable of 1, 2, 3 - it would make more sense to be categorical. A continuous temperature variable makes more sense as Kelvin values, or, granted, Celsius.

The other values for L and M are readily interpreted. Considering that you set H as the reference level, it says that the low temperature group has the highest hazard of death, when adjusted for CF. The medium temperature group also has a higher hazard of death (i.e., higher hazard when compared to high temperature group).

You should also be looking at the upper and lower confidence intervals (CIs), and the Log Rank p-value. For example, a general rule of thumb: if we have a HR=0.7 but it's upper CI passes 1.0, then that is less reliable and this will reflect in the p-value.

ADD REPLY
1
Entering edit mode

Thank you so much for all your help! :) this is much clearer now!

ADD REPLY
1
Entering edit mode
5.4 years ago

The difference will be played out when your temp_new object is encoded as either a categorical or a continuous variable.

For example:

temp_c <- factor(c("L","M","H"), levels=c("L","M","H"))

..is the same as:

temp_c <- factor(c(1,2,3), levels=c(1,2,3))

When you run a model with either of these, a coefficient will be calculated for each level compared to the reference level. All statistical values that are returned will be the same for L M H as per 1 2 3. Thus, in the context of the broader model terms, M and H are separately compared to L, the reference level, while, in the other case, 2 and 3 are compared to 1.

However, these are both different from just encoding the variable as temp_c <- c(1,2,3), a continuous variable, in which case only a single coefficient will be calculated for temp_c.

So, both are statistically valid, but the interpretation is different.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6