Extracting Residuals from Differential Expression Model
2
0
Entering edit mode
4.8 years ago

Code:

# Creation of robust model
design <-model.matrix(~Class+ PC1 + PC2 + PC3)
fit.robust <- lmFit(EL_bc_norm_0, design, method = "robust", maxit=500)
fit.robust.eBayes <- eBayes(fit.robust)


I am trying to extract the residuals out of the above model (named design) relating to each gene and subject. But because there isn't a "y" argument in the model.matrix() function, I am unable to use resid(). From what I understand (my background in statistics is weak) model.matrix() is actually finding the parameter estimates, which will later be use by topTable() to find group differences.

One thread says there no reason to examine residuals for microarray data (https://support.bioconductor.org/p/19622/) but nonetheless, I've been asked to do so.

Looking at another guide (http://jtleek.com/genstats/inst/doc/02_11_many-regressions.html) the traditional non-robust fitted model contains "residuals" but they seem to be absent in my fitted model (All I have is df.residual).

I can use "residuals(fit.robust, EL_bc_norm_0)". This appears to work but I am a bit hesitant about what I am doing.

Thank you!

statistics residuals limma • 2.1k views
2
Entering edit mode
7 months ago
Gordon Smyth ★ 7.0k

The help page ?residuals.MArrayLM tells you to extract residuals by

residuals(fit.robust, EL_bc_norm_0)


Alternatively you could compute residuals yourself by

ResidMatrix <- as.matrix(EL_bc_norm_0) - fit.robust\$fitted.values


But beware, it is hard to think of anything useful that you might do with the residuals, except perhaps look for outliers. There is no assumption in limma that the residuals for different genes come from the same normal distribution. And, even if the limma normality assumptions were perfectly satisfied, the residuals from a robust regression would still not be normally distributed, not even for one gene in isolation. You might like to read my comments about different types of robustness in the discussion of this paper: Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK 2016. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963.

1
Entering edit mode
7 months ago
Surbhi ▴ 10

I think the linear model functions you are talking about are slightly different. The one you are using in your code (lmFit) is from limma and as far as I know, residuals() or residuals.MArrayLM() are appropriate ways to extract residuals when using lmFit(). The function lm.Fit() mentioned in the link you posted is from basic R library (stats) and naturally the data structure of results vary.