Remove Surrogate Variables and Batch
3
3
Entering edit mode
8.2 years ago
tucanj ▴ 100

Is there a way to output a gene expression matrix corrected for batch (I know I can use ComBat) and surrogate variables from SVA? I would like to visualize the results of a limma differential expression however they are confounded and thus not amenable to visualization.

R sva ComBat • 8.1k views
11
Entering edit mode
8.2 years ago
brentp 24k

If you have:

• y as the gene expresion matrix
• mod as the model matrix you sent to sva (the full model)
• svs as svobj\$sv where svobj is the output from the sva function

then you can use the function below to get a "cleaned" version of the matrix with the surrogate variables removed.

cleanY = function(y, mod, svs) {
X = cbind(mod, svs)
Hat = solve(t(X) %*% X) %*% t(X)
beta = (Hat %*% t(y))
rm(Hat)
gc()
P = ncol(mod)
return(y - t(as.matrix(X[,-c(1:P)]) %*% beta[-c(1:P),]))
}


I modified this from a post by Andrew Jaffe here.

0
Entering edit mode

The joys of linear algebra :)

4
Entering edit mode
6.3 years ago
jtleek ▴ 40

It is not recommended that you perform differential expression analysis after data cleaning - the resulting degrees of freedom will be wrong. The best thing to do is include the svs as covariates in downstream linear models in limma etc. as adjustment models.

• Jeff (co-developer of sva)
1
Entering edit mode

Dear Jeff, is it correct to perform a Gene regulatory network (GRN) analysis after the data cleaning with the function mentioned before?

3
Entering edit mode
6.3 years ago
ddiez ★ 1.9k

With Jeff's answer in mind, if you just want to remove the effect of batch variables for visualisation (e.g. heatmap, PCA), the limma function removeBatchEffect() can be used for that purpose.

0
Entering edit mode

Hello, could you be more clear? why just for visualization? are you saying that the Limma corrected dataset cannot be used for other purposes?

0
Entering edit mode

Take a look at ?limma::removeBatchEffect for details of other uses of the processed dataset. For differential expression it is better to use the original dataset and add batch as covariate, as indicated in @jtleek's answer.