I have a RNA-seq dataset with 8k genes and I am trying to predict a dichotomous disease status. I'd like to use penalized maximum likelihood estimation to create a model which uses the influence of all the genes within my dataset. Also, I'd like to use bootstrapping for internal validation of the predictive abilities of my model. I want to use the following code below to create a model and perform bootstrapping but I am unsure whether it is correct to do so. Also, I have encountered an error in my first code line which I do not know how to solve.
LRM <- lrm(disease_status~., data=df, x=TRUE, y=TRUE, maxit=1E6)) #this gives me the following error: singular information matrix in lrm.fit (rank= 119). Offending variable(s): gene 1, gene 2, etc. I don't know how to solve this error. I think that I have read somewhere that this is due to a collinearity problem between different genes, but I don't know if this is true and I don't know how to solve this error.
pentrace(LRM, seq(0, 0.5, by=0.02))
update (LRM,penalty=...) #this depends on the pentrace function result which I cannot fill in yet due to the error above
validate(LRM_PEN, B=1000)
plot(calibrate(LRM, B=1000), xlim=c(0,1), ylim=c(0,1))
ROC<-plot(roc(df$disease_status, LRM$linear.predictors))
Does anyone know if this would be a suitable method to validate my model (1) and can anyone help me solve the error above (2)?
Any help would be much appreciated.