3.4 years ago by

United States

When it is used, a main rationale for log-transformation is heteroskedasticity. The variance of expression measurements on many platforms (arrays, etc.) depends on the expression level. By log-transforming, you reduce this dependence and your data becomes better-behaved for statistical testing. As pointed out by russhh - the choice of the base 2 is just a practical one. Many other transformations can be applied to expression data. The "best" one likely depends on your measurement platform and your analysis application. For example, see variance stabilizing transformations like VST in the DESeq package. Log2 has a long history because it's simple, and it's an improvement on using raw values for statistical analysis in many cases.

•

link
modified 3.4 years ago
•
written
3.4 years ago by
Ahill • **1.8k**
From the previous comments you should now realize that gene expression between two different platforms like microarray and rna-seq has different properties associated with it. Likewise, in mathematics like linear algebra, there are also properties associated with the different functions, distributions and equations; Log base has different scales between base 2 and base 10. It was determined that the negative binomial distribution best fits count data to test the hypothesis for differential expression with confidence. In addition, you can scale and mean center the count data with logbase 10 transformation for biological network analysis. For microarray you can normalize it using the RMA method, and then do t-test or other to test your hypothesis. To circle back, what happens when you integrate the log2 function and what is its derivative and what properties do these have for you to apply to certain data structures that also possess its own properties, as you may have done in calculus? Sorry if this is redundant or hard to get as I was just trying to sum years of study in this small box.

1.1kLog2 aids in calculating fold change, and up-regulated vs down-regulated genes between replicates/samples.

2.5kThere isn't any theoretical reason for using base-2 instead of any other base. One could reasonably use log10 for the fold changes. Microarray-detectable changes in expression tend to be smaller than 10-fold in my experience. You can't use the natural log when presenting data to the bench, unless you want to waste an afternoon. So base-2 makes sense as it's close to the biologically-detectable changes that are microarray-discoverable and it's an easily explainable choice of base when you're presenting to biologists.

5.5k