I am running a logistic regression. The outcome is a clinical variable, there are two predictors (gene expression, hormone levels, and the interaction term between them). There is one hormone, but many genes. I am running the regression for each gene.
The most important statistical assumption one should check in logistic regression (afaik) is linearity. Afaik, the logit of the probability
log(p/1-p) should be in a linear relationship with each of the predictors. This can be checked using a plot.
But, for a large number of genes this is not feasible. Even if I take only the significant ones, there are about 100 such genes.
What would be an efficient way to check the assumption?
The same goes for checking for outliers. Checking for outliers in the hormone is easy. But there are 100 genes. In case that there are outliers which very strongly influence the results, and the graph looks something like
the reliability of the model is also in question. So how does one check that ?