If your variable `hercoc`

has only two levels then there is no difference. However, if it has 3 or more levels then there is a difference. You haven't provided any example data, and I am assuming that `hercoc`

is numeric.

Using a more concrete example:

```
library(survival)
attach(lung)
head(lung)
# inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
#1 3 306 2 74 1 1 90 100 1175 NA
#2 3 455 2 68 1 0 90 90 1225 15
#3 3 1010 1 56 1 0 90 90 NA 15
#4 5 210 2 57 1 1 90 60 1150 11
```

The variable `sex`

has two levels but is coded as a numeric variable here (1 for male, 2 for female or vice versa).

```
summary(as.factor(sex))
1 2
138 90
```

There's only one coefficient fitted by the model regardless of how you write it:

```
coxph(Surv(time, status) ~ sex)
# Call:
coxph(formula = Surv(time, status) ~ sex)
coef exp(coef) se(coef) z p
sex -0.531 0.588 0.167 -3.18 0.0015
Likelihood ratio test=10.6 on 1 df, p=0.00111 n= 228, number of events= 165
coxph(Surv(time, status) ~ as.factor(sex))
Call:
coxph(formula = Surv(time, status) ~ as.factor(sex))
coef exp(coef) se(coef) z p
as.factor(sex)2 -0.531 0.588 0.167 -3.18 0.0015
Likelihood ratio test=10.6 on 1 df, p=0.00111 n= 228, number of events= 165
```

For the variable `ph.ecog`

there are 4 levels

```
summary(as.factor(ph.ecog))
# 0 1 2 3 NA's
# 63 113 50 1 1
```

On fitting the survival model against `ph.ecog`

it really does make a difference whether the variable enters as a numeric or a factor. If treated numerically, only a single coefficient is fitted (for a given individual, the value for ecog is multiplied by this coefficient before entering into the coxph calculation);

```
coxph(Surv(time, status) ~ ph.ecog)
Call:
coxph(formula = Surv(time, status) ~ ph.ecog)
coef exp(coef) se(coef) z p
ph.ecog 0.476 1.61 0.113 4.2 2.7e-05
Likelihood ratio test=17.6 on 1 df, p=2.77e-05 n= 227, number of events= 164
(1 observation deleted due to missingness)
```

However, treated as a factor, three different coefficients will be fitted, one for each non-reference level (ie, levels 1 2 and 3 each have a coefficient) and for a given individual you would look up the coefficient corresponding to the level of the `ecog`

factor.

```
> coxph(Surv(time, status) ~ as.factor(ph.ecog))
Call:
coxph(formula = Surv(time, status) ~ as.factor(ph.ecog))
coef exp(coef) se(coef) z p
as.factor(ph.ecog)1 0.369 1.45 0.199 1.86 6.3e-02
as.factor(ph.ecog)2 0.916 2.50 0.225 4.08 4.5e-05
as.factor(ph.ecog)3 2.208 9.10 1.026 2.15 3.1e-02
Likelihood ratio test=18.4 on 3 df, p=0.000356 n= 227, number of events= 164
(1 observation deleted due to missingness)
```

Look into how the coefficients enter the survival model in a good Generalised linear model book (I really can't explain that quickly for you)

In R the factor data format should be used for categorical data. For example, if you were doing survival analysis for three different treatments

Then you should pass this vector as a factor because the data are categorical. If you did not do this then R would assume the data are continuous and might cause misinterpretations of the results.

On the other hand, if the treatment was of one drug but at different concentrations such as

Then you should not factor these data because they are continuous.

At least that's my understanding, others please chime in