Question

What Is The Math Proof That A The Beta Of A Linear Regression Is Equal To The Log2 Fold Change In Microarrays ?

5

Entering edit mode

10.5 years ago

jerome.lane.34 ▴ 70

Hi,

I have found that some programs use the beta of a linear regression ( gene_exp ~ pheno ) to provide the log2 fold change (e.g. function topTable from limma R package) by looking into the scripts and testing, but I did not found any documentation related to this. So maybe I am right maybe not.

Would it be possible to have some references about this or somebody show me the math proof of that ?

microarray • 9.0k views

ADD COMMENT • link 10.5 years ago by jerome.lane.34 ▴ 70

score 5 · Answer 1 · 2013-10-24

This is only true if the response was log2-transformed prior to running the model.

The easiest way to think through is probably a toy example.

 set.seed(123)
 x <- c(rpois(50,100), rpois(50,75))
 y <- rep(factor(c("A", "B")), each=50)

That's 50 observations for each of 2 phenotypes, with the "true" fold difference between B and A being 75/100 = 3/4 ~ 2^-0.415.

If you fit a linear model the betas are going to tell you the mean of each group:

(betas <- lm(x~y)$coefficients)
# (Intercept)          yB 
#     100.06      -23.68 
(means <- tapply(x,y,mean))
#    A      B 
#100.06  76.38 
betas[1] + betas[2]  == means[2]
# TRUE

Obviously, the betas are not he same as the log2 fold-change. To get that you can either transform the ratio of the estimates

log(sum(betas)/betas[1], 2) 
# -0.3895985

Or perform the transformation before the model-fitting

 lm(log(x,2)~y)$coefficients
 # (Intercept)     yB 
 # 6.639029   -0.389812

EDIT

I forget to explain the math-sy reason why this works, which might not be immediately obvious. As we've seen, when we do a linear regression with a categorical predictor, the Beta values reflect difference in the mean value between groups. If we first log-transform the response values then, of course, we'll end up with a difference of logs. In the example that's log(100) - log(75) which, thanks to the magic of logs, is the same thing as log(100/75): the log fold-difference.

EDIT 2

To clarify about the small difference between taking the log of the ratio of betas, rather that first log-transforming the values. This arises because the log-transform also changes the shape of the distrbution and therefore the mean value. With the toy example, the means of the log transformed x (tapply(log(x,2), y, mean)) is slightly different than log-transform of the means of x (log(tapply(x,y,mean),2))