R use models from survival analysis and do prediction
1
2
Entering edit mode
7.8 years ago
hdy ▴ 160

I am learning survival analysis in R, especially the Cox proportional hazard model. I read a paper talking about using 80% of the sample as training set and 20% of sample as test set.

As quoted

On the training set, we first performed a pre-selection step to keep the top significant features correlated with overall survival (univariate Cox model, likelihood ratio test, P< 0.05). ...We used two computational methods to train the models: (i) Cox: the Cox proportional hazards model with LASSO for feature selection ...We then applied the models thereby obtained to the test set for prediction, and calculated the C-index using the R package survcomp.

I do not know how they actually did to apply the models from Cox model to the test set. I mean, for the training set, I can simply perform a coxph function. But the returned results are "coef,exp(coef),se(coef)),z,p" and likelihood ratio test p-value. How can I treat this as a model and use it on the 20% test set data?

machine-learning R model survival • 18k views
0
Entering edit mode

could you give the reference, please

1
Entering edit mode

paper name "Assessing the clinical utility of cancer genomic and proteomic data across tumor types" is on nature biotechnology. Thanks!

5
Entering edit mode
7.7 years ago
sarajbc ▴ 50

You can try to do something like this:

# Derive model in the training data (after feature selection - I believe that in the paper you mentioned they use LASSO: R has a good package for this: glmnet)
cox_model = coxph(Surv(training_data$Survival,training_data$Status) ~ ., data=training_data)

# Create survival estimates on validation data
pred_validation = predict (cox_model, newdata = validation_data)

# Determine concordance
cindex_validation = concordance.index (pred_validation, surv.time = validation_data$Survival, surv.event=validation_data$Status, method = "noether")


See more here

Hope it helps

0
Entering edit mode

Hi @sarajbc and hey, apologies for raising this old thread again. I am also struggling with a similar problem where I want to predict survival of the patient from methylation data. Can you please help me understanding what does this glmnet cox regression actually predict? For my datasets as I am getting all negative values as my predictions and I have no clue what does these actually mean.