How should I draw a roc curve to generalize the performance of a model?
1
0
Entering edit mode
9 days ago
ssko ▴ 20

Let's say I run the classification model on 100 different data splits, for each data split I collect the predicted probability of the test set. How should I draw a roc curve?

  1. Should I pool all the probabilities to draw it or
  2. Should I calculate the ROC/AUC for each data section and then average the ROC/AUC with the std?
auc plot roc performance classification • 317 views
ADD COMMENT
1
Entering edit mode
9 days ago
Mensur Dlakic ★ 28k

Option #1. By the way, 100 fold splits is probably an overkill even if you have a small dataset, and especially if you have a large dataset.

ADD COMMENT
0
Entering edit mode

I mean, first we split the data set into training and testing, train a specific algorithm for that split on the training set and evaluate the trained algorithm on the test set. I mean to run this process again on 100 different data splits. I want to get a more general result without depending on the performance of the model on a single data split.

ADD REPLY
0
Entering edit mode

I got what you meant, but not convinced that you have the execution part covered.

The way this is normally done is called cross-validation), or CV for short. For a reasonably large dataset, say tens to hundreds of thousands of data points, there is no need to do CV more than 5-10 folds. Only if you have a dataset where the number of data points is in hundreds is it justified to do a 100-fold CV, or even a leave-one-out CV (also described at the link above).

ADD REPLY

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6