Question

how to get an average ROC curve after 10-fold cross validation in r

0

Entering edit mode

14 months ago

Sara • 0

I have done a 10-fold cross validation and I have a list of 10 including sensitivity, specificity, threshold, AUC,... I would like to make a final roc curve plot as an average roc from this list. Also, to know what is the optimal cutoff (top left corner), sensitivity and specificity. Anyone can help of this?

Can I take the mean of sensitivity, specificity, ... of each 10 list? if yes, how? thanks

r cross-validation curve roc • 2.0k views

ADD COMMENT • link 14 months ago by Sara • 0

GenoMax · Answer 1 · 2023-02-05

0

Entering edit mode

14 months ago

Mensur Dlakic ★ 27k

You can take the mean of 10 values for each of those quantities except for ROC curve, but they won't necessarily be accurate. They will be close, though.

A proper way of doing this is to make predictions for validation data for each of 10 folds, and combine them into an out-of-fold sample. Then calculate the values and a ROC curve for that sample. You can compare them with mean values and see how similar they are. Except for the AUC score, I suspect they will be very similar.

ADD COMMENT • link 14 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

thanks. I would like to add some information to get further help.

1)My original cutoff, specificity and sensitivity are:

**# **cutoff: 0.882**   # **sensitivity:0.767**     # **specificity: 0.781****

2) from 10-fold cross-validation:

# cutoff, specificity, sensitivity
# fold1: 0.858 (0.769, 0.747)
# fold2: 0.904 (0.771, 0.731)
# fold3: 0.810 (0.753, 0.773)
# fold4: 0.835 (0.778, 0.758)
# fold5: 0.939 (0.783, 0.722)
# fold6: 0.818 (0.748, 0.774)
# fold7: 0.916 (0.778, 0.736)
# fold8: 1.081 (0.821, 0.691)
# fold9: 0.896 (0.781, 0.742)
# fold10:0.774 (0.744, 0.771)

3) after taking mean:

# mean(0.858,0.904,0.810,0.835,0.939,0.818,0.916,1.081,0.896,0.774)  #  **cutoff: 0.858** 
# mean(0.769,0.771,0.753,0.778,0.783,0.748,0.778,0.821,0.781,0.744)  # **specificity: 0.769**
# mean(0.747,0.731,0.773,0.758,0.722,0.774,0.736,0.691,0.742,0.771)  # **sensitivity: 0.747**

These does make sense? or wrong?

ADD REPLY • link updated 14 months ago by GenoMax 141k • written 14 months ago by Sara • 0

0

Entering edit mode

I have no idea what cutoff is in this context, nor what you mean by original values.

I don't know how you got those values for means. I get 0.8831 for cutoff, 0.7726 for specificity and 0.7445 for sensitivity. Your mean values seem to repeat what is in the first row.

Either way, calculating this from an out-of-fold sample is the way to go. You are already making predictions for validation data, or else you wouldn't have these calculated values. So just put all validation predictions together into one dataframe, and in the end you will have the same size as your train data, but calculated out of fold. Then you can calculate the final values on that complete dataset, without any averaging.