I have single-cell RNAseq samples of cancer cell populations from either a relapse state or just at the diagnostic stage. So there are two states and I would like to characterize the relapse state according to the expression of certain genes. I have previously identified different genes that might be characteristic of a relapse state.
To try to verify this, I plan to apply a logistic regression with the target variable being the state (relapse or diagnostic) and the predictor variables being the expression of the selected genes (about 200 genes). However, I have difficulties in verifying the different assumptions that allow the application of a logistic model, especially concerning the absence of multicollinearity and the presence of a linear relationship between the logit function of the target variable and the predictors.
Are there any particular packages/functions that allow to verify these assumptions?