I've a dataset of six samples and 1530 genes(features) and wish to know the the importance of features. I'm trying to use the "Rank Features By Importance" as mentioned in Feature Selection with the Caret R Package.
I'm using the following code:
rm(list=ls()) set.seed(12345) library(mlbench) library(caret) options(error=utils::recover) #Pastebin link for Data: http://pastebin.com/raw/cg0Kiueq mydata.df <- read.table("data.PasteBin.txt", header=TRUE,sep="\t",stringsAsFactors=TRUE) dim(mydata.df) lvq.control <- trainControl(method="LOOCV") lvq.model <- train(ID~., data=mydata.df, method="lvq", trControl=lvq.control ) #FAILS importance <- varImp(lvq.model, scale=FALSE) print(importance) plot(importance)
The data can be downloaded from the following Pastebin LINK
The program fails to execute with the following error and debug messages:
Error in seeds[[num_rs + 1L]] : subscript out of bounds 1: train(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.control) 2: train.formula(ID ~ ., data = mydata.df, method = "lvq", trControl = lvq.con 3: train(x, y, weights = w, ...) 4: train.default(x, y, weights = w, ...)
I've read from multiple sources that unless the response variable is of class factor Caret issues error like this. However, my response variable('ID') is indeed a factor
> str(mydata.df$ID) Factor w/ 2 levels "NONRC","RC": 2 2 1 1 2 1 The detail of my version of R and Caret are as follows: > packageVersion("caret")  ‘6.0.70’ R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1
Can someone please suggest any remedy?
Thanks in advance