Normalization and batch effects correction in RNA-Seq data
2
1
Entering edit mode
13 months ago
elb ▴ 210

Hi guys, I have a simple question. I have RNA-Seq data from different batches. As suggested looking at many posts on-line I have pre-normalized my data (using the TMM from edgeR) then I have corrected them using Combat and then I have re-normalized them (for the library-size) using DESeq2. My question is: is it correct the second normalisation after Combat? Or at least is it not dramatically not-correct?

Best

RNA-Seq Combat • 1.3k views
3
Entering edit mode
13 months ago

Hey,

Sorry, this is not recommended (by me, and others):

then I have corrected them using Combat

The way in which you have implemented this batch-correction procedure is neither ideal, irrespective of the use of ComBat, due to the fact that you are normalising your data twice, and by 2 different programs.

Kevin

0
Entering edit mode

Dear Kevin, the design of the experiment of the posted question you redirected me to, is al little bit different from my case because I don't have nested design/s. In any case, people suggest, generally, to pre-normalize data in order to remove some high-level variability and then perform batch-correction. I agree with you about the way to correct, i.e. basically using the batch as a covariate. My question is if the normalization after the correction that is basically a ratio of the genes by the library size of each sample is wrong or it is expected not to affect dramatically the identification of variable genes across conditions (i.e. DEGs). Thank you a lot for your help!

0
Entering edit mode

You have accepted that answer from rpolicastro; so, I will assume that the problem has been addressed and avoid responding further.

2
Entering edit mode
13 months ago

A quick note since this is a common problem, but for batch correction you generally need to have multiple conditions per batch. If all your WT samples are in one batch, and all your KD samples are in another batch, you can't correct for it (as an example).

With that being said, you can usually add batch as a covariate to the regression formula in edgeR and DESeq2 as the simpler and more robust option. Your study design would look like the following example for DESeq2:

> df
condition   batch
WT-1        WT batch_1
WT-2        WT batch_2
WT-3        WT batch_2
KO-1        KO batch_1
KO-2        KO batch_2
KO-3        KO batch_2


Your regression formula would then be ~ condition + batch, which means your differential expression results for condition will be corrected for batch.