Why Are Almost All Protein Or Mrna Abundance Measures Transformed By Log?
2
3
Entering edit mode
11.9 years ago
Zhilong Jia ★ 2.2k

Here, the abundance measure can represent concentration, molecular number or ppm, not in the chip-matter.

I don't know why to log them in the classification or regression model as y variable?

Could anybody help me? Thank you.

protein mrna statistics normalization • 4.3k views
ADD COMMENT
0
Entering edit mode

What are the inputs and targets in the regression?

ADD REPLY
0
Entering edit mode

the inputs are some features of protein or mRNA , and targets are protein abundance.

ADD REPLY
0
Entering edit mode

If you log transform the target value, then you are assuming that a unit increase in your feature values gives a multiplicative increase in the target value. If you don't log transform them, then a unit increase in the feature value is additive. Which is the better assumption? Also, if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts. You have to decide what is appropriate based on your regression problem. (David's comments are also important to consider).

ADD REPLY
0
Entering edit mode

"if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts"---I think this points are very important! Actually, I would analyse how much the features explain the y axis. Thank you.

ADD REPLY
3
Entering edit mode
11.9 years ago
Woa ★ 2.9k

Apart from symmetry-fying the increase/decrease scale, I think taking logarithm makes the distribution gaussian(ish) for which you can do parametric tests, taking log also to some extent, takes care of heteroscedasticity (non-uniform mean dependent variance)

ADD COMMENT
0
Entering edit mode

Thank you for your answer. I make a regression, but when with no log , the squared R will be a little higher. So, which one should I use? I'm confused.

ADD REPLY
0
Entering edit mode

The R2 is higher with the non-log version because of the higher range, that doesn't mean that you should be using log.

ADD REPLY
0
Entering edit mode

Does this mean, with log, the range will be lower. "that doesn't mean that you should be using log"------that means I could use the no-log version?

ADD REPLY
4
Entering edit mode
11.9 years ago

One practical reason is so that increases and decreases are on the same scale. 8*2 = 16, an increase of 8, while 8/2 = 4, a decrease of 4. On the log scale, 3+1 = 4, while 3-1=2, a change of 1 unit in both cases. For details about the possible benefits of log-transformation on variance and outliers, read this long but simple tutorial answer written by a statistician.

ADD COMMENT
0
Entering edit mode

when we concentrate on the diffidence before and after changes (such as over-expression comparing itself ), the log is useful. While for the regression or classification, just need to predict the abundance of protein, Is this transform suitable? In addition, clicking the URL http://www.childrensmercy.org/stats/model/log.aspx, it will jump to the main-page (http://www.childrensmercy.org/) . How strange! Could you send the text of that page to my e-mail box: zhilongjia@gmail.com. thank you very much.

ADD REPLY
0
Entering edit mode

I think this is the original reference, see page 90 onwards http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-JMP/JMP-part003.pdf

ADD REPLY
0
Entering edit mode

from the pdf: 1. Is your data bounded below by zero? (Yes) 2. Is your data defined as a ratio? (No) 3. Is the largest value in your data more than three times larger than the smallest value? (Yes) Yes, I think I should log the y variable. In addition, it solves another question which log should I choose. Thank you for your material!

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6