Question

RFECV (Recursive Feature Elimination with Cross Validation) grid scores discrepancies

0

Entering edit mode

3.0 years ago

ivnnvi • 0

I would like to know why the grid scores obtained by RFECV (Recursive Feature Elimination with Cross Validation) for nth features do not match the scores when I run RFE and train a model with same number of folds (Cross Validation).

For instance, the grid scores of RFECV tell me that with the top 1 feature I get a F1 Score of 0.60. When 1) I run RFE to select only 1 feature (which should be the same as in RFECV), 2) train the same model fed to RFECV but with the RFE top 1 feature, 3) with CV to get the F1 Score, it is not the same as in the RFECV grid score. The only time it matches is with the top n features selected by RFECV.

Could it be that RFECV is not the same as doing RFE for n features -> run model with top n RFE features and CV -> F1 score?

How I thought RFECV works is as follows: o Perform RFE without CV and select number of features to, for example, 1/10/20. o Then perform Random Forest with this top 1/10/20 selected features using CV and F1 score. o Compare this F1 score with the one reported by RFE CV. o If I get the same numbers, I know how RFE CV works. However, the F1 scores do not match. I also made sure that the cross validation, random states, seeds, number of folds, and performance metrics are consistent for both RFECV and RFE + RF + CV

RFECV RFE CV • 654 views

ADD COMMENT • link 3.0 years ago by ivnnvi • 0