I am trying to perform differential expression (DE) analysis on various publicly available GTEx tissue samples. For example, I am comparing 45 kidney samples w/ 175 liver samples. In an attempt to account for batch effect, I am considering to treat each sample as a separate batch, i.e. 45+175 = 220 separate batches, in my model matrix. However something tells me that this isn't how batch effects work since the example I saw in the edgeR tutorial dealt with 6 samples and 3 batches (1,2,3,1,2,3).
Is it even possible to account for batch effects for this comparison using edgeR or DESeq2? Or should I just go forward with a typical DE analysis without worrying about batch effects.
Thank you
Thank you for your answer WouterDeCoster, this makes more sense. I would like to expand upon your example regarding both control and treatment patients sequenced with kit B. Now I have a feeling that the answer to my question may be yes but I will ask anyway.
Would you still be able to correct for batch effects if one control patient was sequenced with kit B but two treatment patients were sequenced with kit B?
I am asking this question in relation to creating a model.matrix for edgeR where the batches would be represented by: Batch 1 (Kit A): control1, control2, treatment1; Batch 2 (Kit B): control3, treatment2, treatment3
You mean if the conditions are not equally balanced across the batches? That shouldn't be a real problem, provided that your groups are large enough. In your example, some groups contain just a single sample, but for sound statistics your groups should be larger.
Makes sense, thank you once again for your reply!