Question: Is reusing a subset of features in another layer of classification wrong or over-fitting?
0
gravatar for Floydian_slip
7 months ago by
Floydian_slip130
United States
Floydian_slip130 wrote:

Hi, I am using a machine learning code written by somebody else where they run the first level of classification using 10 features. Then they select some instances which were not classified with certainty (between a certain range of output value) and rerun them again with a smaller subset of those 10 features but different classifiers not used before (eg., SVM but different model of it). Then they do this one more time with another subset and different classification models. So, some of these features are used in 3 different rounds. What are the pitfalls of this approach, if any? Is this over-fitting in any way (not in a traditional sense of course). Will this approach be criticized if we launched with this?

Thanks!

ADD COMMENTlink modified 7 months ago by Mensur Dlakic6.9k • written 7 months ago by Floydian_slip130
0
gravatar for Mensur Dlakic
7 months ago by
Mensur Dlakic6.9k
USA
Mensur Dlakic6.9k wrote:

There is nothing that sounds like overfitting here as long as you use the same folds for each classifier. But it does sound unnecessary to do it this way. This is essentially what boosting does by differentially weighting in next iteration what was misclassified in previous. I presume the reason they are using different features is to create non-overlapping expertise between individual classifiers, which is a good idea in general. Still, gradient boosting trees do all of that automatically, including feature selection, so you could save yourself some time and probably come up with a better classifier in the end by simply going with one of classifiers that uses (extreme) gradient boosted trees. As much as I like SVMs - for historic and practical reasons - I haven't done a single project in the past 10 years (at least hundreds) where SVMs outperformed boosted trees, either in terms of training speed or classification/regression performance. Unless you have a small dataset where proper classifier calibration is essential, I can't imagine that in your case it would be any different.

ADD COMMENTlink written 7 months ago by Mensur Dlakic6.9k

Thanks, Mensur. I do realize that this is not over-fitting in the strict sense of the word. My main concern is: can this approach be criticized and be found inappropriate in any way after we go to the market with it? Moreover, SVM was just an example of the classifier used; others like NN and linear regression are also used. I ran gradient boosting on the initial dataset but the performance did not reach as high as this 3-step approach does. What could explain that? This difference in performance made me worried about the risk of this approach doing something which could be later torn apart. Any insight will be helpful. Thanks again!

ADD REPLYlink written 7 months ago by Floydian_slip130

I ran gradient boosting on the initial dataset but the performance did not reach as high as this 3-step approach does. What could explain that?

If you are making an ensemble of three different classifiers and comparing that to gradient boosting alone, it is possible that the former would do better. That's the whole point of ensembling - getting individually inferior classifiers to produce a superior ensemble. That would be my interpretation without knowing the details of your procedure: 1) how many folds; 2) whether you are carrying the same folds through all stages; 3) what classifiers are used; 4) what is the difference between individual classifiers and the ensemble; 5) do you use all data points but give higher weights to the ones that were misclassified? As long as you do it properly, this procedure should not overfit.

ADD REPLYlink written 7 months ago by Mensur Dlakic6.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1454 users visited in the last hour