4 months ago by

Sheffield, UK

Unfortunately `removeBatchEffects`

uses `contr.sum`

in accounting for batch. That means that it effectually corrects everything to the average of all the batches.

Here is the `removeBatchEffects`

code:

```
function (x, batch = NULL, batch2 = NULL, covariates = NULL,
design = matrix(1, ncol(x), 1), ...)
{
if (is.null(batch) && is.null(batch2) && is.null(covariates))
return(as.matrix(x))
if (!is.null(batch)) {
batch <- as.factor(batch)
contrasts(batch) <- contr.sum(levels(batch)) # <--------- this line tells it to average batches
batch <- model.matrix(~batch)[, -1, drop = FALSE]
}
if (!is.null(batch2)) {
batch2 <- as.factor(batch2)
contrasts(batch2) <- contr.sum(levels(batch2))
batch2 <- model.matrix(~batch2)[, -1, drop = FALSE]
}
if (!is.null(covariates))
covariates <- as.matrix(covariates)
X.batch <- cbind(batch, batch2, covariates)
fit <- lmFit(x, cbind(design, X.batch), ...)
beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(X.batch)
}
```

In order to correct the average of one batch to the average of the other, we need to use the default contrast scheme instead. This is as simple as removing the marked line in the above:

```
new_batch_effects_removeal <- function(m, batch, design, matrix(1, ncol(x), 1), ...) {
if (!is.null(batch)) {
batch <- as.factor(batch)
batch <- model.matrix(~batch)[, -1, drop = FALSE]
}
fit <- lmFit(x, cbind(design, batch), ...)
beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
beta[is.na(beta)] <- 0
as.matrix(x) - beta %*% t(batch)
}
```

In this simplified version I've removed refrences to `batch2`

and `covariates`

. It will take the alphanumerically first batch (i.e. A before B, or 0 before 1, 1 before 2) as the reference and correct other batches to that.

I guess that you could work through the code of

`removeBatchEffects`

to find a way to do this68k