Say I have RNA-seq data from 3 conditions (A,B,C) and a single batch (X)
X A 2 B 2 C 2
I then decide that I want to include 2 more conditions (D, E) from a separate batch (Y)
Y D 2 E 2
Now, I want to calculate differential expression between all pairs of conditions.
My initial impression is that there is no way to salvage this design. In order to account for differences due to batch I would need to have included samples from conditions D/E in batch X and conditions A/B/C in batch Y - is that a reasonable conclusion?
Additionally, I wondered whether if I just included samples from conditions A/B/C in batch Y whether I could batch-correct the data based on the subset of shared conditions? In this way, it seems the data is only partially confounded.