RNA-Seq: X samples and X batches
1
0
Entering edit mode
5.2 years ago

I am trying to perform differential expression (DE) analysis on various publicly available GTEx tissue samples. For example, I am comparing 45 kidney samples w/ 175 liver samples. In an attempt to account for batch effect, I am considering to treat each sample as a separate batch, i.e. 45+175 = 220 separate batches, in my model matrix. However something tells me that this isn't how batch effects work since the example I saw in the edgeR tutorial dealt with 6 samples and 3 batches (1,2,3,1,2,3).

Is it even possible to account for batch effects for this comparison using edgeR or DESeq2? Or should I just go forward with a typical DE analysis without worrying about batch effects.

Thank you

RNA-Seq edgeR DESeq2 Batch-Effect • 1.5k views
ADD COMMENT
5
Entering edit mode
5.2 years ago

However something tells me that this isn't how batch effects work

I agree with that something. A batch effect is a group effect. There is no way that you can use this for single samples. Although I'm not sure I understand it entirely, it also seems your batch effect is confounded by the biological question.

You can only correct a batch effect if both conditions have samples in each batch. Say that you are comparing patients with healthy controls. A batch effect would be that some samples were sequenced using another sequencing kit B (technical difference). You can correct for this only if you have both patients and control samples sequenced with kit B.

On the contrary, if all patients were sequenced with kit A and all controls with kit B then there is no way to figure out which variability is i) due to the biological question ii) due to the technical difference.

ADD COMMENT
0
Entering edit mode

Thank you for your answer WouterDeCoster, this makes more sense. I would like to expand upon your example regarding both control and treatment patients sequenced with kit B. Now I have a feeling that the answer to my question may be yes but I will ask anyway.

Would you still be able to correct for batch effects if one control patient was sequenced with kit B but two treatment patients were sequenced with kit B?

I am asking this question in relation to creating a model.matrix for edgeR where the batches would be represented by: Batch 1 (Kit A): control1, control2, treatment1; Batch 2 (Kit B): control3, treatment2, treatment3

ADD REPLY
1
Entering edit mode

You mean if the conditions are not equally balanced across the batches? That shouldn't be a real problem, provided that your groups are large enough. In your example, some groups contain just a single sample, but for sound statistics your groups should be larger.

ADD REPLY
0
Entering edit mode

Makes sense, thank you once again for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6