Scikit-learn feature selection, just select the train set?
1
0
Entering edit mode
8.9 years ago
hrbrt.sch ▴ 10

Hello,

I'm using scikit-learn for machine learning. I have 800 samples with 2048 features, therefore I want to reduce my features to get hopefully a better accuracy.

It is a multiclass problem (class 0-5), and the features consists of 1's and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]

I'm using the Random Forest Classifier.

Should I just feature select the training data ? And is it enough if I'm using this code:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini', max_depth=13)
clf.fit(X_train, y_train).transform(X_train)

predicted=clf.predict(X_test)
expected=y_test
confusionMatrix=metrics.confusion_matrix(expected,predicted)

Cause the accuracy didn't get higher. Is everything OK in the code or am I doing something wrong?

I'll be very grateful for your help.

Machine-learning Python Scikit-Learn • 3.7k views
ADD COMMENT
0
Entering edit mode
8.9 years ago

Should I just feature select the training data?

Yes, it is just for training set. After some important features was picked up based on the training set, the you can use these features in the test set.

For the accuracy, there are many factors can give an effect on it. For example, normalized features and imbalanced samples, etc.

Hope this helps

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6