Question

RNA-Seq: X samples and X batches

0

Entering edit mode

5.2 years ago

Behram Radmanesh ▴ 40

I am trying to perform differential expression (DE) analysis on various publicly available GTEx tissue samples. For example, I am comparing 45 kidney samples w/ 175 liver samples. In an attempt to account for batch effect, I am considering to treat each sample as a separate batch, i.e. 45+175 = 220 separate batches, in my model matrix. However something tells me that this isn't how batch effects work since the example I saw in the edgeR tutorial dealt with 6 samples and 3 batches (1,2,3,1,2,3).

Is it even possible to account for batch effects for this comparison using edgeR or DESeq2? Or should I just go forward with a typical DE analysis without worrying about batch effects.

Thank you

RNA-Seq edgeR DESeq2 Batch-Effect • 1.5k views

ADD COMMENT • link updated 2 days ago by Ram 43k • written 5.2 years ago by Behram Radmanesh ▴ 40

score 5 · Accepted Answer · 2019-02-06

5

Entering edit mode

5.2 years ago

WouterDeCoster 47k

However something tells me that this isn't how batch effects work

I agree with that something. A batch effect is a group effect. There is no way that you can use this for single samples. Although I'm not sure I understand it entirely, it also seems your batch effect is confounded by the biological question.

You can only correct a batch effect if both conditions have samples in each batch. Say that you are comparing patients with healthy controls. A batch effect would be that some samples were sequenced using another sequencing kit B (technical difference). You can correct for this only if you have both patients and control samples sequenced with kit B.

On the contrary, if all patients were sequenced with kit A and all controls with kit B then there is no way to figure out which variability is i) due to the biological question ii) due to the technical difference.

ADD COMMENT • link 5.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you for your answer WouterDeCoster, this makes more sense. I would like to expand upon your example regarding both control and treatment patients sequenced with kit B. Now I have a feeling that the answer to my question may be yes but I will ask anyway.

Would you still be able to correct for batch effects if one control patient was sequenced with kit B but two treatment patients were sequenced with kit B?

I am asking this question in relation to creating a model.matrix for edgeR where the batches would be represented by: Batch 1 (Kit A): control1, control2, treatment1; Batch 2 (Kit B): control3, treatment2, treatment3

ADD REPLY • link 5.2 years ago by Behram Radmanesh ▴ 40

1

Entering edit mode

You mean if the conditions are not equally balanced across the batches? That shouldn't be a real problem, provided that your groups are large enough. In your example, some groups contain just a single sample, but for sound statistics your groups should be larger.