Entering edit mode
2.3 years ago
txtbookir
▴
30
Hi everyone, i know there are similar topics in the field but mine is a bit different, after running the ML model (using a relatively large dataset of cancer gene expression levels (700 samples, 40 features) i got the following ROC curve, which is not smooth, and by a fast search i got the following solution:
You're using thresholded predictions to generate the ROC-curve. You should instead use the original confidence values, otherwise you will get only 1 intermediary point on the curve.
my codes in python for RF (as example) are as follow:
rf_clf = RandomForestClassifier(n_estimators=200,max_features=5)
rf_clf.fit(train_data, train_label)
y_pred_rf = rf_clf.predict(test_data)
plot_results(test_label,y_pred_rf,"Random Forest")
then
fpr_rf, tpr_rf, _ = roc_curve(test_label, y_pred_rf)
auc_rf= auc(fpr_rf,tpr_rf)
plt.figure(figsize=(10,12),dpi=200)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_rf, tpr_rf, label='Random Forest (area = {:.3f})'.format(auc_rf))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()
thanks for any suggestion. Kevin Blighe