I have RNA-seq data of 54 samples from 6 mouse brain regions with 3 different groups. One group is control (no stress), another group is stress-resilient (Res) and the other group is stress-susceptible (Sus).
These 54 samples are sequenced in 5 different days. Most of the data with the same brain region are sequenced at the same day. (RNA extraction was performed on the same day for all samples. And library prep. and sequencing was performed by region. That is, sequencing and prep. date will be the same.)
So I tried to perform batch correction using ComBat, but it seems that the difference by region was normalized.
In this case, any suggestion?
This is my ComBat R script:
datexpr <- read.csv("FPKM.csv", row.names=1) trait <- read.csv("trait.csv") trait$seqdate<-as.factor(trait$seqdate) batch <- trait$seqdate datExpr.combat = ComBat(dat=as.matrix(datexpr), batch=batch, mod=NULL) write.csv(datExpr.combat, "Corrected_FPKM.csv")
And My data information (total 54 samples):
Region State Seqdate 1 AMY Con 191214 x 3 (ex. AMY_Con1, AMY_Con2, AMY_Con3, same below) 2 AMY Res 191214 3 AMY Sus 191214 4 HIP Con 191214 5 HIP Res 191214 6 HIP Sus 191214 7 CBC Con 190826 8 CBC Res 190826 9 CBC Sus 190826 10 NAc Con 191029 11 NAc Res 191029 12 NAc Sus 191029 13 PFC Con 200317 14 PFC Res 200317 15 PFC Sus 200317 16 VTA Con 200427 17 VTA Res 200427 18 VTA Sus 200427
Refer to the following question. Batch correction using DESeq2
Apologies about the confusion. I edited my post.
RNA extraction was performed on the same day for all samples. And library prep. and sequencing was performed by region. That is, sequencing and prep. date will be the same.
So, I can't correct batch effect.
If you prepped all the samples of one tissue type on one day, and another tissue type another day, you screwed up, and math can't fix that. Do the intra-tissue comparisons you can, and next time, plan better.
Thank you, I will have to consider intra-tissue comparisons.
As said by swbames2, you confounded region with date in your experimental design, so now every time you try to correct for batch effects in date, region will also be corrected. You have 3 choices 1) re-design your experiement 2) don't do batch correction 3) figure out a complicated stat model to separate effects from date and region if they are not 100% confounded. but please remember every reviewer will question your method.
Thank you for your reply!