Ensemble Neural Network - Stacking ensemble neural network accuracy is significantly similar or low compared to base models
0
0
Entering edit mode
5 weeks ago

**Context:** I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a Cox proportional hazard models (CoxPH). Using these 3 models we can obtain survival probability and Risk score, which is what i did. I obtained Survival probability at 4 timepoints and a risk score for each model and use them as features to train the neural network with a custom loss function. The model only consist of 22 base features and using the 22 base features we generate 15 meta features which is then used at training the neural network. The total sample size is 2918 and the data is split in 3, 70% for training, 20 % for testing, and 10% for validation.

**Problem:** In survival analysis the accuracy metric we use is called C-Index and the C-index in our case is similar or somewhat lower compared to at least one of the features. Ensemble technique said to increase the accuracy but why is this happening in my case?

Code This is the code which I used to construct my neural network, I'll provide additional code which used for the base model if needed.

   import pandas as pd
    import tensorflow as tf
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import Input, Dense, Dropout
    from tensorflow.keras.optimizers import Adam
    from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
    from tensorflow.keras import regularizers

def cox_loss(y_true, y_pred):
    # Extract event indicator and survival time
    event = tf.cast(y_true[:, 0], dtype=tf.float32)  # 1 if event occurred, 0 otherwise
    risk = y_pred[:, 0]  # Predicted risk scores

    # Compute the log of cumulative sum of exponentials of predicted risks
    exp_risk = tf.math.exp(risk)
    log_cumsum_risk = tf.math.log(tf.math.cumsum(exp_risk, reverse=True))

    # Cox partial likelihood
    loss = -tf.reduce_mean(event * (risk - log_cumsum_risk))
    return loss



 # training data from base models
    train_data = pd.read_csv("E:/meta_ensemble_model/code/csv/survival_probabilities_train.csv")

    # Load the validation data from base models 
    val_data = pd.read_csv("E:/meta_ensemble_model/code/csv/survival_probabilities_validation.csv")

    # Extract features (survival probabilities) and labels (event and time) for training
    X_train = train_data.drop(columns=['patient_id', 'event', 'time', 'age'])
    y_train = train_data[['event', 'time']]

    # Extract features and labels for validation
    X_val = val_data.drop(columns=['patient_id', 'event', 'time', 'age'])
    y_val = val_data[['event', 'time']]

    # Convert y_train and y_val to numpy arrays for the custom loss function
    y_train_np = y_train.to_numpy()
    y_val_np = y_val.to_numpy()

#  input layer and model architecture with regularization 
input_layer = Input(shape=(X_train.shape[1],))
hidden_layer_1 = Dense(15, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(input_layer)
dropout_1 = Dropout(0.3)(hidden_layer_1)
hidden_layer_2 = Dense(10, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_1)
dropout_2 = Dropout(0.3)(hidden_layer_2)
hidden_layer_3 = Dense(5, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_2)
dropout_3 = Dropout(0.3)(hidden_layer_3)
hidden_layer_4 = Dense(2, activation='relu', kernel_regularizer=regularizers.l2(0.0001))(dropout_3)
dropout_4 = Dropout(0.3)(hidden_layer_4)
output_layer = Dense(1, activation='linear')(dropout_4)



 model = Model(inputs=input_layer, outputs=output_layer)

    # Compile the model with the custom Cox loss function
    model.compile(optimizer=Adam(learning_rate=0.000008), loss=cox_loss)

    # Define the file path for saving the best checkpoint

    checkpoint_filepath = "E:/meta_ensemble_model/code/saved_models/cox_nn_model_checkpoint.h5"

    # Define a checkpoint callback that saves the best model and prints verbose messages
    checkpoint_callback = ModelCheckpoint(
        filepath=checkpoint_filepath,
        monitor='val_loss',
        verbose=1,
        save_best_only=True,
        mode='min'
    )



   # early stopping callback to prevent overfitting and restore the best model weights
    early_stopping = EarlyStopping(monitor='val_loss', patience=50, restore_best_weights=True)

    # Train the model with checkpoint and early stopping callbacks (Increased batch size from 4 to 8)
    history = model.fit(
        X_train, y_train_np,
        validation_data=(X_val, y_val_np),
        batch_size=5,
        epochs=2000,
        callbacks=[checkpoint_callback, early_stopping],
        verbose=1
    )

    # Evaluate the model on the validation data
    val_loss = model.evaluate(X_val, y_val_np, verbose=1)
    print(f"Validation Loss (Cox Loss): {val_loss}")

    # Predict risk scores on the validation set
    risk_scores_val = model.predict(X_val).flatten()

    # Compute the concordance index 
    c_index = concordance_index(y_val['time'], -risk_scores_val, y_val['event'])
    print(f"Validation Concordance Index: {c_index}")

    # save the final best model 
    model.save("E:/meta_ensemble_model/code/saved_models/cox_nn_model.h5")

    # Load the saved model for inference
    loaded_model = tf.keras.models.load_model(
        "E:/meta_ensemble_model/code/saved_models/cox_nn_model.h5",
        custom_objects={'cox_loss': cox_loss}
    )

    # Predict risk scores using the loaded model to verify successful loading
    risk_scores_test = loaded_model.predict(X_val).flatten()
    print("Risk scores predicted successfully.")
Neural-network Gradient-boosting Cox Random-forest Ensemble • 971 views
ADD COMMENT
1
Entering edit mode

Lots of choices are literally no-sense here. Why using together Random Forest and gradient boosting trees and, together, a cox regression? What is the point of feeding an MLP with probabilities emitted by three different models?

ADD REPLY
0
Entering edit mode

The 3 models which I used as based models are very widely used ones for performing survival analysis. Here what I tried to accomplish is to create an ensemble using these well established models. The RSF and GBSM are able to capture the non-linear changes which happen within my data so my idea is to overcome the trade-offs from each model by combining them using a meta learner.

I'm new to data science so please feel free to point out any mistake and give any tips. Thankyou

ADD REPLY
1
Entering edit mode

Imho, this is a wrong approach. You had to start from a single model and tuning features and parameters to capture limits and strength. Only if you suspect that these limits could be filled by another model, then stack another layer. Trade-offs could be managed also using regularization or feature transformations, for example.

I don't know exactly how your dataset is structured and which features are you using, but.. did you do some feature engineering? How many of them are relevant for your dataset? Is there any transformation you could do to them? I'd point you to the scikit-survival user guide (https://scikit-survival.readthedocs.io/en/stable/user_guide/index.html) which explains theory together with code.

ps. It's very likely that the code you've posted here came from a GPT-like model. Just my two cents: use them first for understanding theory behind models, rather than going straight into implementation.

ADD REPLY

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6