Question: Why gene expression data should be log2 transformed?
gravatar for Grunir112
3.4 years ago by
Grunir11210 wrote:

I have realized that usually gene expression data (e.g. seq data) should be transformed using log2 instead of using e.g. log10 transformation. Why log2 transformation is commonly used but not other transformation? I would like to understand a basic theory behind log2 transformation linked to gene expression data.

sequencing genome gene • 12k views
ADD COMMENTlink modified 3.4 years ago by Ahill1.8k • written 3.4 years ago by Grunir11210

From the previous comments you should now realize that gene expression between two different platforms like microarray and rna-seq has different properties associated with it. Likewise, in mathematics like linear algebra, there are also properties associated with the different functions, distributions and equations; Log base has different scales between base 2 and base 10. It was determined that the negative binomial distribution best fits count data to test the hypothesis for differential expression with confidence. In addition, you can scale and mean center the count data with logbase 10 transformation for biological network analysis. For microarray you can normalize it using the RMA method, and then do t-test or other to test your hypothesis. To circle back, what happens when you integrate the log2 function and what is its derivative and what properties do these have for you to apply to certain data structures that also possess its own properties, as you may have done in calculus? Sorry if this is redundant or hard to get as I was just trying to sum years of study in this small box.

ADD REPLYlink written 3.4 years ago by theobroma221.1k

Log2 aids in calculating fold change, and up-regulated vs down-regulated genes between replicates/samples.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by

There isn't any theoretical reason for using base-2 instead of any other base. One could reasonably use log10 for the fold changes. Microarray-detectable changes in expression tend to be smaller than 10-fold in my experience. You can't use the natural log when presenting data to the bench, unless you want to waste an afternoon. So base-2 makes sense as it's close to the biologically-detectable changes that are microarray-discoverable and it's an easily explainable choice of base when you're presenting to biologists.

ADD REPLYlink written 3.4 years ago by russhh5.5k
gravatar for Ahill
3.4 years ago by
United States
Ahill1.8k wrote:

When it is used, a main rationale for log-transformation is heteroskedasticity. The variance of expression measurements on many platforms (arrays, etc.) depends on the expression level. By log-transforming, you reduce this dependence and your data becomes better-behaved for statistical testing. As pointed out by russhh - the choice of the base 2 is just a practical one. Many other transformations can be applied to expression data. The "best" one likely depends on your measurement platform and your analysis application. For example, see variance stabilizing transformations like VST in the DESeq package. Log2 has a long history because it's simple, and it's an improvement on using raw values for statistical analysis in many cases.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Ahill1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 947 users visited in the last hour