random forest advice
Entering edit mode
7 months ago
marion.ryan ▴ 40

I am using random forest package to predict 'norm' versus 'chol', with the code below and have got a nice output regarding the importance of a panel of genes contributing to the classification of diseased tissues however I have been reading up on this and am wondering if I need a training and test data set, I have 11 normal and 18 diseased. I am very happy with the intuitive outputs this is giving but want to make sure its right

library(randomForest) clus2<-read.csv("PCA_NvC_SVM_sig.csv", sep = ",", header = T, row.names = 1) 
clus2.rf <- randomForest(Pathology ~ ., data=clus2, importance=TRUE, proximity=TRUE) 

result Call: randomForest(formula = Pathology ~ ., data = clus2, importance = TRUE, proximity = TRUE) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4

OOB estimate of error rate: 10.34% Confusion matrix: Chol Norm class.error Chol 17 1 0.05555556 Norm 2 9 0.18181818

Look at variable importance:

Imp<-round(importance(clus2.rf), 2) write.table(Imp, "Importance.csv",sep=",") varImpPlot(clus2.rf)

randomforest geneexpression • 225 views

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6