I have performed WGCNA analyses on my count files. The WGCNA analysis gave me a module significantly related to tumor state. This module consists of 1700 genes, which I filtered by log2 fc, baseman expression, and excluding non-coding genes to 95 genes. Then, I imported them to perform a binominal univariate regression analysis. I gave the p-value cut-off (0.005) to univariate regression. The dependent variation was (cancer vs. normal), and the predictor was the 95 genes. This analysis just excluded one gene and exported 94 genes. Then, I imported the expression data of 94 genes to create a lasso regression model to know which genes better predict the tumor state. Finally, I calculated the AUC =1. Now I have two questions: 1- Why did univariate regression exclude only one gene? Is AUC=1 normal after performing lasso regression, and why has it happened?
these are my codes:
set.seed(2)
data<- as.data.frame(data)
index <- createDataPartition(data$result, p = 0.8, list = F, times = 1)
train_data <- data[index,]
test_data <- data[-index,]
train_data[] <- lapply(train_data, as.numeric)
x <- model.matrix( result ~ . ,
data = train_data)[,-1]
y <- train_data[,'result']
cv.model <- cv.glmnet(x = x,
y = y,
alpha = 1,
family = 'binomial',
nfolds = 10 )
lambda_min <- cv.model$lambda.min
lasso.model <- glmnet(x = x,
y = y,
alpha = 1,
family = 'binomial',
nfolds = 10,
lambda = lambda_min)
test_data[] <- lapply(test_data, as.numeric)
x2 <- model.matrix( result ~ . ,
data = test_data)[,-1]
y2 <- test_data[,'result']
auc_test <- assess.glmnet(lasso.model,
newx = x2,
newy = y2)$auc
auc_training <- assess.glmnet(lasso.model,
newx = x,
newy = y)$auc
auc_training
i would expect to be perfect or near perfect. what is the value returned forauc_test
?