Hi all,
I have RNA-seq data of 54 samples from 6 mouse brain regions with 3 different groups. One group is control (no stress), another group is stress-resilient (Res) and the other group is stress-susceptible (Sus).
These 54 samples are sequenced in 5 different days. Most of the data with the same brain region are sequenced at the same day. (RNA extraction was performed on the same day for all samples. And library prep. and sequencing was performed by region. That is, sequencing and prep. date will be the same.)
So I tried to perform batch correction using ComBat, but it seems that the difference by region was normalized.
In this case, any suggestion?
This is my ComBat R script:
datexpr <- read.csv("FPKM.csv", row.names=1)
trait <- read.csv("trait.csv")
trait$seqdate<-as.factor(trait$seqdate)
batch <- trait$seqdate
datExpr.combat = ComBat(dat=as.matrix(datexpr), batch=batch, mod=NULL)
write.csv(datExpr.combat, "Corrected_FPKM.csv")
And my data information (total 54 samples):
Region State Seqdate
1 AMY Con 191214 x 3 (ex. AMY_Con1, AMY_Con2, AMY_Con3, same below)
2 AMY Res 191214
3 AMY Sus 191214
4 HIP Con 191214
5 HIP Res 191214
6 HIP Sus 191214
7 CBC Con 190826
8 CBC Res 190826
9 CBC Sus 190826
10 NAc Con 191029
11 NAc Res 191029
12 NAc Sus 191029
13 PFC Con 200317
14 PFC Res 200317
15 PFC Sus 200317
16 VTA Con 200427
17 VTA Res 200427
18 VTA Sus 200427
Refer to the following question.
Thank you!
oh sorry,
Apologies about the confusion. I edited my post.
RNA extraction was performed on the same day for all samples. And library prep. and sequencing was performed by region. That is, sequencing and prep. date will be the same.
So, I can't correct batch effect.
If you prepped all the samples of one tissue type on one day, and another tissue type another day, you screwed up, and math can't fix that. Do the intra-tissue comparisons you can, and next time, plan better.
Thank you, I will have to consider intra-tissue comparisons.
As said by swbames2, you confounded region with date in your experimental design, so now every time you try to correct for batch effects in date, region will also be corrected. You have 3 choices 1) re-design your experiement 2) don't do batch correction 3) figure out a complicated stat model to separate effects from date and region if they are not 100% confounded. but please remember every reviewer will question your method.
Thank you for your reply!