Question: Why Are Almost All Protein Or Mrna Abundance Measures Transformed By Log?
3
gravatar for Zhilong Jia
6.7 years ago by
Zhilong Jia1.4k
London
Zhilong Jia1.4k wrote:

Here, the abundance measure can represent concentration, molecular number or ppm, not in the chip-matter.

I don't know why to log them in the classification or regression model as y variable?

Could anybody help me? Thank you.

ADD COMMENTlink modified 6.7 years ago by Woa2.7k • written 6.7 years ago by Zhilong Jia1.4k

What are the inputs and targets in the regression?

ADD REPLYlink written 6.7 years ago by Qdjm1.9k

the inputs are some features of protein or mRNA , and targets are protein abundance.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Zhilong Jia1.4k

If you log transform the target value, then you are assuming that a unit increase in your feature values gives a multiplicative increase in the target value. If you don't log transform them, then a unit increase in the feature value is additive. Which is the better assumption? Also, if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts. You have to decide what is appropriate based on your regression problem. (David's comments are also important to consider).

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Qdjm1.9k

"if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts"---I think this points are very important! Actually, I would analyse how much the features explain the y axis. Thank you.

ADD REPLYlink written 6.7 years ago by Zhilong Jia1.4k
3
gravatar for Woa
6.7 years ago by
Woa2.7k
United States
Woa2.7k wrote:

Apart from symmetry-fying the increase/decrease scale, I think taking logarithm makes the distribution gaussian(ish) for which you can do parametric tests, taking log also to some extent, takes care of heteroscedasticity (non-uniform mean dependent variance)

ADD COMMENTlink written 6.7 years ago by Woa2.7k

Thank you for your answer. I make a regression, but when with no log , the squared R will be a little higher. So, which one should I use? I'm confused.

ADD REPLYlink written 6.7 years ago by Zhilong Jia1.4k

The R2 is higher with the non-log version because of the higher range, that doesn't mean that you should be using log.

ADD REPLYlink written 6.7 years ago by Qdjm1.9k

Does this mean, with log, the range will be lower. "that doesn't mean that you should be using log"------that means I could use the no-log version?

ADD REPLYlink written 6.7 years ago by Zhilong Jia1.4k
4
gravatar for David Quigley
6.7 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

One practical reason is so that increases and decreases are on the same scale. 8*2 = 16, an increase of 8, while 8/2 = 4, a decrease of 4. On the log scale, 3+1 = 4, while 3-1=2, a change of 1 unit in both cases. For details about the possible benefits of log-transformation on variance and outliers, read this long but simple tutorial answer written by a statistician.

ADD COMMENTlink written 6.7 years ago by David Quigley11k

when we concentrate on the diffidence before and after changes (such as over-expression comparing itself ), the log is useful. While for the regression or classification, just need to predict the abundance of protein, Is this transform suitable? In addition, clicking the URL http://www.childrensmercy.org/stats/model/log.aspx, it will jump to the main-page (http://www.childrensmercy.org/) . How strange! Could you send the text of that page to my e-mail box: zhilongjia@gmail.com. thank you very much.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Zhilong Jia1.4k

I think this is the original reference, see page 90 onwards http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-JMP/JMP-part003.pdf

ADD REPLYlink written 6.7 years ago by Woa2.7k

from the pdf: 1. Is your data bounded below by zero? (Yes) 2. Is your data defined as a ratio? (No) 3. Is the largest value in your data more than three times larger than the smallest value? (Yes) Yes, I think I should log the y variable. In addition, it solves another question which log should I choose. Thank you for your material!

ADD REPLYlink written 6.7 years ago by Zhilong Jia1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 756 users visited in the last hour