Interpreting partial dependence plots
0
0
Entering edit mode
3.0 years ago
fana ▴ 40

I have created a model predicting an outcome with two classes ("active" and "no history") including 23 features-genes using VST transformed RNA-seq data (134 samples in the training dataset). I have used caret package in R and using glm model. I am trying to interpret my model using pdp package.

I have attached a pdp plot for gene A (5th most important gene in my model), pdp plot with ICE curves and a boxplot showing the expression of this gene for "active" and "no history" status. Looking at the pdp plot I understand that the higher the expression of gene A the higher the probability for predicting "active" status. However, I see the opposite in the boxplot. I observe this phenomenon for 3 more features in my model however the range for the probability is 0.28-0.3 which is almost a flat line if I plot the pdp from 0-1 probability. These other 3 genes are lower in the list of feature importance.

I have checked that features do not correlate with each other.

I am wondering why this happens I assume that the other features in the model interact some how with gene A. Is this a valid assumption?

Creating model

library(caret) ctrl.continuous <- trainControl(method = "repeatedcv", number = 10, repeats = 5, savePredictions = "final") model<-train(train_VST_clin[,-24],train_VST_clin[,outcomeName], method='glm', trControl=ctrl.continuous)

PDP plot

library(pdp) pdp_plot <- partial(model, pred.var = "gene A", prob=TRUE, plot = TRUE, rug=TRUE, which.class="active")

PDP, ICE, boxplot

machinelearning caret partialdependenceplot pdp R • 556 views
ADD COMMENT

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6