I'm trying to perform lasso regression and cross-validation on a RNA seq dataset to create a combination of RNAs that could most accurately predict a disease status. However, I am not sure if what I'm doing is the best way and would like some advise if this is the best way to go forward. Also, I have a question on how to view the coefficients in my lasso regression model at the bottom of my text.
Some background information: I have 192 samples within two classes (healthy and disease) of my dataset. Therefore, I think that cross-validation would be more appropriate to evaluate my model than a train-test split. Also, I would like to use a fairly low amount of variables/genes to best predict disease status, so that is why I'm using lasso regression as a machine learning method.
To create the model I have used the caret package:
myControl <- trainControl( method = "cv", number = 10, summaryFunction = twoClassSummary, classProbs = TRUE, verboseIter = TRUE
To evaluate the performance of the model I have used the caret and glmnet package:
model <- train( disease_status~ ., my_dataset, method = "glmnet", trControl = myControl )
By printing the output of
model, I can find that the accuracy of the model is at an alpha of 1.0 (and lambda of 0.01334) gives an roc of 0.82. However, I don't know how to print the classifiers that are used by my lasso regression model to receive this result.
Coef(model) just returns
NULL. Can anyone help me with these questions: if lasso regression and CV is best for solving my problem and how to print lasso classifiers?