**Context:** I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a Cox proportional hazard models (CoxPH). Using these 3 models we can obtain survival probability and Risk score, which is what i did. I obtained Survival probability at 4 timepoints and a risk score for each model and use them as features to train the neural network with a custom loss function. The model only consist of 22 base features and using the 22 base features we generate 15 meta features which is then used at training the neural network. The total sample size is 2918 and the data is split in 3, 70% for training, 20 % for testing, and 10% for validation.
**Problem:** In survival analysis the accuracy metric we use is called C-Index and the C-index in our case is similar or somewhat lower compared to at least one of the features. Ensemble technique said to increase the accuracy but why is this happening in my case?
Code This is the code which I used to construct my neural network, I'll provide additional code which used for the base model if needed.
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras import regularizers
def cox_loss(y_true, y_pred):
# Extract event indicator and survival time
event = tf.cast(y_true[:, 0], dtype=tf.float32) # 1 if event occurred, 0 otherwise
risk = y_pred[:, 0] # Predicted risk scores
# Compute the log of cumulative sum of exponentials of predicted risks
exp_risk = tf.math.exp(risk)
log_cumsum_risk = tf.math.log(tf.math.cumsum(exp_risk, reverse=True))
# Cox partial likelihood
loss = -tf.reduce_mean(event * (risk - log_cumsum_risk))
return loss
# training data from base models
train_data = pd.read_csv("E:/meta_ensemble_model/code/csv/survival_probabilities_train.csv")
# Load the validation data from base models
val_data = pd.read_csv("E:/meta_ensemble_model/code/csv/survival_probabilities_validation.csv")
# Extract features (survival probabilities) and labels (event and time) for training
X_train = train_data.drop(columns=['patient_id', 'event', 'time', 'age'])
y_train = train_data[['event', 'time']]
# Extract features and labels for validation
X_val = val_data.drop(columns=['patient_id', 'event', 'time', 'age'])
y_val = val_data[['event', 'time']]
# Convert y_train and y_val to numpy arrays for the custom loss function
y_train_np = y_train.to_numpy()
y_val_np = y_val.to_numpy()
# input layer and model architecture with regularization
input_layer = Input(shape=(X_train.shape[1],))
hidden_layer_1 = Dense(15, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(input_layer)
dropout_1 = Dropout(0.3)(hidden_layer_1)
hidden_layer_2 = Dense(10, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_1)
dropout_2 = Dropout(0.3)(hidden_layer_2)
hidden_layer_3 = Dense(5, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_2)
dropout_3 = Dropout(0.3)(hidden_layer_3)
hidden_layer_4 = Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_3)
dropout_4 = Dropout(0.3)(hidden_layer_4)
output_layer = Dense(1, activation='linear')(dropout_4)
model = Model(inputs=input_layer, outputs=output_layer)
# Compile the model with the custom Cox loss function
model.compile(optimizer=Adam(learning_rate=0.000008), loss=cox_loss)
# Define the file path for saving the best checkpoint
checkpoint_filepath = "E:/meta_ensemble_model/code/saved_models/cox_nn_model_checkpoint.h5"
# Define a checkpoint callback that saves the best model and prints verbose messages
checkpoint_callback = ModelCheckpoint(
filepath=checkpoint_filepath,
monitor='val_loss',
verbose=1,
save_best_only=True,
mode='min'
)
# early stopping callback to prevent overfitting and restore the best model weights
early_stopping = EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)
# Train the model with checkpoint and early stopping callbacks (Increased batch size from 4 to 8)
history = model.fit(
X_train, y_train_np,
validation_data=(X_val, y_val_np),
batch_size=5,
epochs=2000,
callbacks=[checkpoint_callback, early_stopping],
verbose=1
)
# Evaluate the model on the validation data
val_loss = model.evaluate(X_val, y_val_np, verbose=1)
print(f"Validation Loss (Cox Loss): {val_loss}")
# Predict risk scores on the validation set
risk_scores_val = model.predict(X_val).flatten()
# Compute the concordance index
c_index = concordance_index(y_val['time'], -risk_scores_val, y_val['event'])
print(f"Validation Concordance Index: {c_index}")
# save the final best model
model.save("E:/meta_ensemble_model/code/saved_models/cox_nn_model.h5")
# Load the saved model for inference
loaded_model = tf.keras.models.load_model(
"E:/meta_ensemble_model/code/saved_models/cox_nn_model.h5",
custom_objects={'cox_loss': cox_loss}
)
# Predict risk scores using the loaded model to verify successful loading
risk_scores_test = loaded_model.predict(X_val).flatten()
print("Risk scores predicted successfully.")
Lots of choices are literally no-sense here. Why using together Random Forest and gradient boosting trees and, together, a cox regression? What is the point of feeding an MLP with probabilities emitted by three different models?
The 3 models which I used as based models are very widely used ones for performing survival analysis. Here what I tried to accomplish is to create an ensemble using these well established models. The RSF and GBSM are able to capture the non-linear changes which happen within my data so my idea is to overcome the trade-offs from each model by combining them using a meta learner.
I'm new to data science so please feel free to point out any mistake and give any tips. Thankyou
Imho, this is a wrong approach. You had to start from a single model and tuning features and parameters to capture limits and strength. Only if you suspect that these limits could be filled by another model, then stack another layer. Trade-offs could be managed also using regularization or feature transformations, for example.
I don't know exactly how your dataset is structured and which features are you using, but.. did you do some feature engineering? How many of them are relevant for your dataset? Is there any transformation you could do to them? I'd point you to the scikit-survival user guide (https://scikit-survival.readthedocs.io/en/stable/user_guide/index.html) which explains theory together with code.
ps. It's very likely that the code you've posted here came from a GPT-like model. Just my two cents: use them first for understanding theory behind models, rather than going straight into implementation.