Unfortunately removeBatchEffects
uses contr.sum
in accounting for batch. That means that it effectually corrects everything to the average of all the batches.
Here is the removeBatchEffects
code:
function (x, batch = NULL, batch2 = NULL, covariates = NULL,
design = matrix(1, ncol(x), 1), ...)
{
if (is.null(batch) && is.null(batch2) && is.null(covariates))
return(as.matrix(x))
if (!is.null(batch)) {
batch <- as.factor(batch)
contrasts(batch) <- contr.sum(levels(batch)) # <--------- this line tells it to average batches
batch <- model.matrix(~batch)[, -1, drop = FALSE]
}
if (!is.null(batch2)) {
batch2 <- as.factor(batch2)
contrasts(batch2) <- contr.sum(levels(batch2))
batch2 <- model.matrix(~batch2)[, -1, drop = FALSE]
}
if (!is.null(covariates))
covariates <- as.matrix(covariates)
X.batch <- cbind(batch, batch2, covariates)
fit <- lmFit(x, cbind(design, X.batch), ...)
beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(X.batch)
}
In order to correct the average of one batch to the average of the other, we need to use the default contrast scheme instead. This is as simple as removing the marked line in the above:
new_batch_effects_removeal <- function(m, batch, design, matrix(1, ncol(x), 1), ...) {
if (!is.null(batch)) {
batch <- as.factor(batch)
batch <- model.matrix(~batch)[, -1, drop = FALSE]
}
fit <- lmFit(x, cbind(design, batch), ...)
beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(batch)
}
In this simplified version I've removed refrences to batch2
and covariates
. It will take the alphanumerically first batch (i.e. A before B, or 0 before 1, 1 before 2) as the reference and correct other batches to that.
I guess that you could work through the code of
removeBatchEffects
to find a way to do this