I am investigating the effect of statins on the expression of approximately 50 000 genes by comparing two groups: patients treated with or without statins.
I chose a model as follows: gene expression ~ age + gender + smoking status. Unfortunately, after adjustment, the qqplot of p-values obtained after comparing the two groups (Wilcoxon signed rank test) indicates a bias. I have found methods that could help to improve the model (AIC, Mallows Cp, R square), however they were based on the expression of a single gene.
The method "surrogate variable analysis" (SVA) has the aim to remove factors that are unknown, unmeasured or not taken in account in a model and takes in account all gene expressions. However, I have some difficulties to make it work on my data.
Do you know a method that allows to test a model on a set of genes and other methods similar to SVA ?