Question

Batch-correction of partially confounded RNA-seq data

0

Entering edit mode

3.2 years ago

James Ashmore ★ 3.4k

Say I have RNA-seq data from 3 conditions (A,B,C) and a single batch (X)

  X
A 2
B 2
C 2

I then decide that I want to include 2 more conditions (D, E) from a separate batch (Y)

  Y
D 2
E 2

Now, I want to calculate differential expression between all pairs of conditions.

My initial impression is that there is no way to salvage this design. In order to account for differences due to batch I would need to have included samples from conditions D/E in batch X and conditions A/B/C in batch Y - is that a reasonable conclusion?

Additionally, I wondered whether if I just included samples from conditions A/B/C in batch Y whether I could batch-correct the data based on the subset of shared conditions? In this way, it seems the data is only partially confounded.

RNA-Seq • 516 views

ADD COMMENT • link updated 3.2 years ago by Carlo Yague 8.7k • written 3.2 years ago by James Ashmore ★ 3.4k

score 1 · Answer 1 · 2021-03-02

My initial impression is that there is no way to salvage this design. In order to account for differences due to batch I would need to have included samples from conditions D/E in batch X and conditions A/B/C in batch Y - is that a reasonable conclusion?

Yes, having samples in both batches is the one and only way to correct for batch effect.

Additionally, I wondered whether if I just included samples from conditions A/B/C in batch Y whether I could batch-correct the data based on the subset of shared conditions? In this way, it seems the data is only partially confounded.

At a bare minimum, you need only one sample sequenced in both batches to correct for batch effect. It is best if you have more of course, but as we usually assume that a batch effect affects all sample similarly (no batch:sample interaction), it is statistically not required to have all samples in both batches.That being said, the more data you have, the best batch effect correction you can make.