I am trying to perform an analysis that I've not seen others done and would like some guidance.
I ran a DEG analysis comparing populations A and B. I account for batch effects by incorporating the batch as a covariate in the design formula, e.g., ~batch + group
I have also run a DEG analysis comparing populations X and Y, where X and Y are sub-populations of B. That is, I've divided B into two separate groups and compared their transcriptomic responses, again incorporating batch effects as above.
The hypothesis I'd like to test is whether the differences between A and B are really a result of differences between A and X. However, I don't have those data.
What I'd like to be able to do is approximate the contribution of the two sub-populations to the first DEG analysis. Are the DEGs between A and B a result primarily of X? Y? Both?
Ideally, I'd run a DEG analysis comparing A and X, then A and Y, but I can't account for batch effects because those experiments were run on different days.
Would it be possible to compare counts normalized by limma::removeBatchEffect
to estimate the sub-population contributions to the first DEG analysis? My thought was to create a linear model setting the normalized counts from B ~ normalized X + normalized Y, but I don't know if this would be acceptable