Question: R use models from survival analysis and do prediction
gravatar for hdy
6.2 years ago by
United States
hdy130 wrote:

I am learning survival analysis in R, especially the Cox proportional hazard model. I read a paper talking about using 80% of the sample as training set and 20% of sample as test set.


As quoted "On the training set, we first performed a pre-selection step to keep the top significant features correlated with overall survival (univariate Cox model, likelihood ratio test, P < 0.05). ... We used two computational methods to train the models: (i) Cox: the Cox proportional hazards model with LASSO for feature selection ... We then applied the models thereby obtained to the test set for prediction, and calculated the C-index using the R package survcomp."


I do not know how they actually did to apply the models from Cox model to the test set. I mean, for the training set, I can simply perform a coxph function. But the returned results are "coef,exp(coef),se(coef)),z,p"  and likelood ratio test p-value. How can I treat this as a model and use it on the 20% test set data?

ADD COMMENTlink modified 4.7 years ago by openabstract0 • written 6.2 years ago by hdy130

could you give the reference, please

ADD REPLYlink written 6.2 years ago by russhh5.5k

paper name "Assessing the clinical utility of cancer genomic and proteomic data across tumor types" is on nature biotechnology. Thanks!

ADD REPLYlink written 6.2 years ago by hdy130
gravatar for sarajbc
6.2 years ago by
sarajbc50 wrote:

You can try to do something like this:

# Derive model in the training data (after feature selection - I believe that in the paper you mentioned they use LASSO: R has a good package for this: glmnet)

cox_model = coxph(Surv(training_data$Survival,training_data$Status) ~ ., data=training_data) 

# Create survival estimates on validation data
pred_validation = predict (cox_model, newdata = validation_data)

# Determine concordance
cindex_validation = concordance.index (pred_validation, surv.time = validation_data$Survival,
                                       surv.event=validation_data$Status, method = "noether")


See more here:


Hope it helps

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by sarajbc50

Hi @sarajbc and hey, apologies for raising this old thread again. I am also struggling with a similar problem where I want to predict survival of the patient from methylation data. Can you please help me understanding what does this glmnet cox regression actually predict? For my datasets as I am getting all negative values as my predictions and I have no clue what does these actually mean.

ADD REPLYlink modified 17 months ago • written 17 months ago by Researcher60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1492 users visited in the last hour