It's my first time trying to implement elastic net algorithm on microarray data and I'm facing some issues.
My dataset DATA
is a data frame that is consisted by 72 observations (rows) and 18703 predictors (columns). Observations are stored as the 18704th column in the data frame and are in replicates like the following.
DATA["classes"] = c( rep("A",10),
rep("B",13),
rep("C",3),
rep("D",10),
rep("E",10),
rep("F",10),
rep("G",10),
rep("H",3),
rep("I",3))
The way I'm trying to execute the elastic net is the following
# Random Sampling with 70-30% for training and validation respectively
y = z = 0
while(y != 9 || z != 9){
sample = sample(x = 1:nrow(DATA) , size = 0.7 * nrow(DATA) )
train = DATA[sample,]
test = DATA[-sample,]
y = length(unique(train$classes))
z = length(unique(test$classes))
}
# Execute the glmnet CV
y = as.factor(train$classes)
cvfit = cv.glmnet(train[,-18704], y, family="multinomial", alpha=0.9)
The thing now is that I'm getting this error that I cannot understand what it actually means.
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
(list) object cannot be coerced to type 'double'
In addition: Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
one multinomial or binomial class has fewer than 8 observations; dangerous ground
Is this because the replications of each observation are not even (some have 10 and others 3 ) or is it something different ?
Also,while reading the documentation of the cv.glmnet
I couldn't understand what the nfolds argument stands for. So if anyone can explain this it would be very helpful.
Thanks in advance.