Question: Batch correct batch 2 while leaving batch 1 unchanged.
1
gravatar for i.sudbery
4 months ago by
i.sudbery9.8k
Sheffield, UK
i.sudbery9.8k wrote:

Lets say I have trained a module on a data set, the data here is rlog values for several 'omics datatypes, and has been batch corrected within that dataset using removeBatchEffects from limma.

Now lets say I want to test this model on a different dataset. Ideally I should remove the study specific effects before I start. But if you put the new data, together with the old data into removeBatchEffects my intution is that the data from study 1 will also be changed, and study 1 will no longer correspond to the data used to train the model.

Is there a way to run removeBatchEffects so as to leave one batch unchanged? If I code study 1 as the reference level in the formula will that achieve this? Does anyone know of an alternative approach?

ADD COMMENTlink modified 4 months ago • written 4 months ago by i.sudbery9.8k

I guess that you could work through the code of removeBatchEffects to find a way to do this

ADD REPLYlink written 4 months ago by Kevin Blighe68k
1
gravatar for i.sudbery
4 months ago by
i.sudbery9.8k
Sheffield, UK
i.sudbery9.8k wrote:

Unfortunately removeBatchEffects uses contr.sum in accounting for batch. That means that it effectually corrects everything to the average of all the batches.

Here is the removeBatchEffects code:

function (x, batch = NULL, batch2 = NULL, covariates = NULL, 
    design = matrix(1, ncol(x), 1), ...) 
{
    if (is.null(batch) && is.null(batch2) && is.null(covariates)) 
        return(as.matrix(x))
    if (!is.null(batch)) {
        batch <- as.factor(batch)        
        contrasts(batch) <- contr.sum(levels(batch))   # <--------- this line tells it to average batches
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    if (!is.null(batch2)) {
        batch2 <- as.factor(batch2)
        contrasts(batch2) <- contr.sum(levels(batch2))
        batch2 <- model.matrix(~batch2)[, -1, drop = FALSE]
    }
    if (!is.null(covariates)) 
        covariates <- as.matrix(covariates)
    X.batch <- cbind(batch, batch2, covariates)
    fit <- lmFit(x, cbind(design, X.batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(X.batch)
}

In order to correct the average of one batch to the average of the other, we need to use the default contrast scheme instead. This is as simple as removing the marked line in the above:

new_batch_effects_removeal <- function(m, batch, design, matrix(1, ncol(x), 1), ...) {
   if (!is.null(batch)) {
        batch <- as.factor(batch)
        batch <- model.matrix(~batch)[, -1, drop = FALSE]
    }
    fit <- lmFit(x, cbind(design, batch), ...)
    beta <- fit$coefficients[, -(1:ncol(design)), drop = FALSE]
    beta[is.na(beta)] <- 0
    as.matrix(x) - beta %*% t(batch)
}

In this simplified version I've removed refrences to batch2 and covariates. It will take the alphanumerically first batch (i.e. A before B, or 0 before 1, 1 before 2) as the reference and correct other batches to that.

ADD COMMENTlink written 4 months ago by i.sudbery9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1482 users visited in the last hour