Entering edit mode

9.7 years ago

M K
▴
660

Hi All,

I have expression read count, and there are many zeros in this data. so what what is the better transformation we should use to transform the expression level data. I found some people used log10 and other used log2 transformations.

Could you please tell me why log2 is better that log10

Its not about better and worse, Its just the way to express your values.

e.g. if your gene is 8 fold upregulated then log2 would be 3 and log10 would be 0.9

if you have majority of genes having differential expression value, more than 10 (which is very less likely), then represent them with log10 so that values would be displayed from the scale 1

this is a small portion of the read count expression that I have before using transformation, so is it okay to use log2 for it.

yes, DESeq2 would do it in its pipeline, before comparing, if you are looking for DEGs,

low counts would be discarded

Just in case someone comes across this later and wants to know more about this than a biologist probably ever should:

It's not that DESeq2 converts the counts to log2 scale, but rather that it fits the data with a model using a log2 link (this is the case for many many many tools). Why? A couple reasons really. Firstly, it makes the math much easier. For example, when not using a log2 link, coefficients multiply, meaning they can quickly get very large or very small. This can quickly lead to loss of precision. This is especially the case as a coefficient approaches 0, since no one uses infinite precision math for anything that needs to be quick. On the log2 scale the coefficients will simply sum. Further, the range on the log2 scale is changed from [0, infinity] to [-infinity, infinity]. This is convenient for optimization (the class of functions that tend to actually be used to perform the maximization (actually, minimization, but that's a different post...) in "maximum likelihood expectation"). You can do bounded optimization, but it's simpler to have an infinite range.

BTW, these are the same reasons we do logistic (or probit) regression. A logit converts the range [0, 1] to [-infinity, +infinity] and has all of the other benefits.

Its an Answer Devon , very good answer :)

Actually, I'm making yours an answer and up-voting it :) Mine is more of an overly long aside!

Note : Its offtopic comment :)

I like the way you help here and on SeqAnswers Devon (y) , you in Bonn, me in Berlin, hope we'd see someday :)

It's not that big of a country, so the odds of bumping into each other at some point is pretty high!