Question

Batch correct batch 2 while leaving batch 1 unchanged.

1

Entering edit mode

5.0 years ago

i.sudbery 21k

Lets say I have trained a module on a data set, the data here is rlog values for several 'omics datatypes, and has been batch corrected within that dataset using removeBatchEffects from limma.

Now lets say I want to test this model on a different dataset. Ideally I should remove the study specific effects before I start. But if you put the new data, together with the old data into removeBatchEffects my intution is that the data from study 1 will also be changed, and study 1 will no longer correspond to the data used to train the model.

Is there a way to run removeBatchEffects so as to leave one batch unchanged? If I code study 1 as the reference level in the formula will that achieve this? Does anyone know of an alternative approach?

limma machine-learning batch-effect • 1.5k views

ADD COMMENT • link updated 14 months ago by Ram 45k • written 5.0 years ago by i.sudbery 21k

0

Entering edit mode

I guess that you could work through the code of removeBatchEffects to find a way to do this

ADD REPLY • link 5.0 years ago by Kevin Blighe 89k

score 1 · Answer 1 · 2020-07-31

Unfortunately removeBatchEffects uses contr.sum in accounting for batch. That means that it effectually corrects everything to the average of all the batches.

Here is the removeBatchEffects code:

function (x, batch = NULL, batch2 = NULL, covariates = NULL, 
    design = matrix(1, ncol(x), 1), ...) 
{
    if (is.null(batch) && is.null(batch2) && is.null(covariates)) 
        return(as.matrix(x))
    if (!is.null(batch)) {
        batch <- as.factor(batch)        
        contrasts(batch) <- contr.sum(levels(batch))   # <--------- this line tells it to average batches
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    if (!is.null(batch2)) {
        batch2 <- as.factor(batch2)
        contrasts(batch2) <- contr.sum(levels(batch2))
        batch2 <- model.matrix(~batch2)[, -1, drop = FALSE]
    }
    if (!is.null(covariates)) 
        covariates <- as.matrix(covariates)
    X.batch <- cbind(batch, batch2, covariates)
    fit <- lmFit(x, cbind(design, X.batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(X.batch)
}

In order to correct the average of one batch to the average of the other, we need to use the default contrast scheme instead. This is as simple as removing the marked line in the above:

new_batch_effects_removeal <- function(m, batch, design, matrix(1, ncol(x), 1), ...) {
   if (!is.null(batch)) {
        batch <- as.factor(batch)
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    fit <- lmFit(x, cbind(design, batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(batch)
}

In this simplified version I've removed refrences to batch2 and covariates. It will take the alphanumerically first batch (i.e. A before B, or 0 before 1, 1 before 2) as the reference and correct other batches to that.