6.0 years ago by

Salt Lake City, UT

If you have:

- 'y' as the gene expresion matrix
- 'mod' as the model matrix you sent to sva (the full model)
- 'svs' as svobj$sv where svobj is the output from the sva function

then you can use the function below to get a "cleaned" version of the matrix with the surrogate variables removed.

cleanY = function(y, mod, svs) {
X = cbind(mod, svs)
Hat = solve(t(X) %*% X) %*% t(X)
beta = (Hat %*% t(y))
rm(Hat)
gc()
P = ncol(mod)
return(y - t(as.matrix(X[,-c(1:P)]) %*% beta[-c(1:P),]))
}

I modified this from a post by Andrew Jaffe here:

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/42857