Batch correct batch 2 while leaving batch 1 unchanged.
1
1
Entering edit mode
3.7 years ago

Lets say I have trained a module on a data set, the data here is rlog values for several 'omics datatypes, and has been batch corrected within that dataset using removeBatchEffects from limma.

Now lets say I want to test this model on a different dataset. Ideally I should remove the study specific effects before I start. But if you put the new data, together with the old data into removeBatchEffects my intution is that the data from study 1 will also be changed, and study 1 will no longer correspond to the data used to train the model.

Is there a way to run removeBatchEffects so as to leave one batch unchanged? If I code study 1 as the reference level in the formula will that achieve this? Does anyone know of an alternative approach?

limma machine-learning batch-effect • 1.0k views
ADD COMMENT
0
Entering edit mode

I guess that you could work through the code of removeBatchEffects to find a way to do this

ADD REPLY
1
Entering edit mode
3.7 years ago

Unfortunately removeBatchEffects uses contr.sum in accounting for batch. That means that it effectually corrects everything to the average of all the batches.

Here is the removeBatchEffects code:

function (x, batch = NULL, batch2 = NULL, covariates = NULL, 
    design = matrix(1, ncol(x), 1), ...) 
{
    if (is.null(batch) && is.null(batch2) && is.null(covariates)) 
        return(as.matrix(x))
    if (!is.null(batch)) {
        batch <- as.factor(batch)        
        contrasts(batch) <- contr.sum(levels(batch))   # <--------- this line tells it to average batches
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    if (!is.null(batch2)) {
        batch2 <- as.factor(batch2)
        contrasts(batch2) <- contr.sum(levels(batch2))
        batch2 <- model.matrix(~batch2)[, -1, drop = FALSE]
    }
    if (!is.null(covariates)) 
        covariates <- as.matrix(covariates)
    X.batch <- cbind(batch, batch2, covariates)
    fit <- lmFit(x, cbind(design, X.batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(X.batch)
}

In order to correct the average of one batch to the average of the other, we need to use the default contrast scheme instead. This is as simple as removing the marked line in the above:

new_batch_effects_removeal <- function(m, batch, design, matrix(1, ncol(x), 1), ...) {
   if (!is.null(batch)) {
        batch <- as.factor(batch)
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    fit <- lmFit(x, cbind(design, batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(batch)
}

In this simplified version I've removed refrences to batch2 and covariates. It will take the alphanumerically first batch (i.e. A before B, or 0 before 1, 1 before 2) as the reference and correct other batches to that.

ADD COMMENT

Login before adding your answer.

Traffic: 1356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6