Can I use removeBatchEffect() before performing DEG analysis in bulk RNA-seq with limma-voom?
1
0
Entering edit mode
10 weeks ago
Sumeet ▴ 10

Hi Everyone

I’m working with bulk RNA-seq data and using the limma-voom pipeline. I want to correct for batch effects after voom transformation, and then perform differential expression analysis (DEG).

dge <- calcNormFactors(dge)  # Normalizes for library size
design <- model.matrix(~ group, data = metadata)  # Only includes group; batch not included here

v <- voom(dge, design)  # Converts to log2-CPM and estimates mean–variance trend

# Remove batch effect after voom using linear model
v$E <- removeBatchEffect(v$E, batch = batch, design = model.matrix(~ group, data = metadata))

# v$E can now be used for visualization (PCA, heatmaps, etc.)

# DEG analysis
design2 <- model.matrix(~ group, data = metadata)
fit <- lmFit(v$E, design2)
fit <- eBayes(fit)
results <- topTable(fit, coef = "groupTreatment", number = Inf)

My questions: Is this pipeline statistically valid for DEG analysis?

Does removeBatchEffect() preserve the group differences after correction?

Is this approach preferred over including batch in the design matrix directly?

I’ve read that removeBatchEffect() used for DEG in some cases so I’d appreciate clarification or recommended best practices.

Thank you, Sumeet

Batchcorrection VOOM DEGanalysis Limma bulkRNA • 733 views
ADD COMMENT
4
Entering edit mode
10 weeks ago
ATpoint 89k

This has been asked many times before. If you browse support.bioconductor.org you find plenty of posts from the limma authors duscussing that. For a condensed version, I suggest to simply read the limma user guide which will recommend to include batch into the design. As voom starts from raw counts you anyway cannot alter them with removeBatchEffect which assumes log2-scale counts.

Basically, v$E is just the logCPMs from limma with a constant prior of 0.5. The magic behind the methods is the precision weights which you cannot easily (or at all) use outside of its intended use in the model. That leaves two options:

Either use voom with batch in the design, or create logCPMs with cpm(x, log=TRUE) from edgeR and then remove the batch effect from these with removeBatchEffect. The latter can then be used for plotting and other downstream analysis. You could test these corrected counts for DGE with limma-trend, even though the limma authors recommend against that in favor of again just including batch into the design and using the uncorrected logCPMs. That is all covered in the user guide though.

ADD COMMENT
2
Entering edit mode

TLDR: You don't need to run removeBatchEffects, because voom will automatically remove the batch effects if you tell it about them.

ADD REPLY
0
Entering edit mode

Hi Ian,

Thank you for your response.

ADD REPLY
0
Entering edit mode

Hi ATpoint and i.sudbery

Thank you for the helpful explanation, I really appreciate the clarity.

Following up:

I used removeBatchEffect() on voom$E using a broad group variable (e.g., Healthy, Disease, Treated, etc.) to generate a batch-corrected expression matrix.

Now, I’d like to reuse this corrected v$E to perform multiple DEG analyses based on other metadata columns that were not included in the original design. For example:

Responder vs Non-Responder (from a Response_Status column)

Inflamed vs Uninflamed (from an Inflammation_Status column)

Comparisons across treatment arms (e.g., Drug A vs Drug B)

My plan is:

Reuse the already batch-corrected v$E matrix.

For each comparison, define a new design matrix based on the relevant metadata.

Run lmFit(v$E, design) and eBayes() to extract differential expression.

Is this a statistically sound approach, or should I explicitly model batch again in each new DEG contrast?

Thanks again for your time and insights!

ADD REPLY
3
Entering edit mode

You should explicitly model batch in each new DEG contrast.

ADD REPLY

Login before adding your answer.

Traffic: 3077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6