What is the better transformation we should use to transform the expression level?
1
0
Entering edit mode
9.4 years ago
M K ▴ 660

Hi All,

I have expression read count, and there are many zeros in this data. so what what is the better transformation we should use to transform the expression level data. I found some people used log10 and other used log2 transformations.

next-gen RNA-Seq R • 7.2k views
ADD COMMENT
1
Entering edit mode
9.4 years ago
Manvendra Singh ★ 2.2k

Use DESeq2, It handle zero value to data, and would make log2 transformations as well, using log10 would not be good idea

ADD COMMENT
0
Entering edit mode

Could you please tell me why log2 is better that log10

ADD REPLY
1
Entering edit mode

Its not about better and worse, Its just the way to express your values.

e.g. if your gene is 8 fold upregulated then log2 would be 3 and log10 would be 0.9

if you have majority of genes having differential expression value, more than 10 (which is very less likely), then represent them with log10 so that values would be displayed from the scale 1

ADD REPLY
0
Entering edit mode

this is a small portion of the read count expression that I have before using transformation, so is it okay to use log2 for it.

3910
14
17
112
3115
15
87
785
9
2
34
21
2
56
664
112
1827
570
131
1631
10
2
3570
2
50
20
227
2
2
11
496
163
80
172
546
200
138
ADD REPLY
0
Entering edit mode

yes, DESeq2 would do it in its pipeline, before comparing, if you are looking for DEGs,

low counts would be discarded

ADD REPLY
3
Entering edit mode

Just in case someone comes across this later and wants to know more about this than a biologist probably ever should:

It's not that DESeq2 converts the counts to log2 scale, but rather that it fits the data with a model using a log2 link (this is the case for many many many tools). Why? A couple reasons really. Firstly, it makes the math much easier. For example, when not using a log2 link, coefficients multiply, meaning they can quickly get very large or very small. This can quickly lead to loss of precision. This is especially the case as a coefficient approaches 0, since no one uses infinite precision math for anything that needs to be quick. On the log2 scale the coefficients will simply sum. Further, the range on the log2 scale is changed from [0, infinity] to [-infinity, infinity]. This is convenient for optimization (the class of functions that tend to actually be used to perform the maximization (actually, minimization, but that's a different post...) in "maximum likelihood expectation"). You can do bounded optimization, but it's simpler to have an infinite range.

BTW, these are the same reasons we do logistic (or probit) regression. A logit converts the range [0, 1] to [-infinity, +infinity] and has all of the other benefits.

ADD REPLY
0
Entering edit mode

Its an Answer Devon , very good answer :)

ADD REPLY
0
Entering edit mode

Actually, I'm making yours an answer and up-voting it :) Mine is more of an overly long aside!

ADD REPLY
0
Entering edit mode

Note : Its offtopic comment :)

I like the way you help here and on SeqAnswers Devon (y) , you in Bonn, me in Berlin, hope we'd see someday :)

ADD REPLY
0
Entering edit mode

It's not that big of a country, so the odds of bumping into each other at some point is pretty high!

ADD REPLY

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6