Hi all, I have read Johnson et al. about ComBat and usage but I am unsure about best practices on that. I have an expression matrix (in linear, not log2 or log2ratio) where an expression value is about 3700, above the 75 percentile that is 2665. Detection p-value for that is 0.

After ComBat this value decreases to -301, shifting below the 25 percentile that is -214.5 ..

Is that normal?

From Johnson et al. I read that data has to be normalized before ComBat ('We assume that the data have been normalized and expression values have been estimated for all genes and samples'). How is it intended to be normalized?

Therefore, I have to use linear, log2 or log2ratio data? Always from Johnson et al. I read : 'In order to account for this situation, we have presented a very flexible EB frame- work for adjusting for additive, multiplicative, and exponential (when data have been log transformed) batch effects'

so I thought that data CAN be log transformed not MUST be log transformed..

Thanks

I'd use background-corrected, normalized, log2 data. See this thread also: https://stat.ethz.ch/pipermail/bioconductor/2013-April/052233.html

hi, sorry is it possible to perform combat on RPKM/FPKM data? which are normalised count but not log2data