Normalization methods in RNA-seq read counts data
1
1
Entering edit mode
5.8 years ago
hougiotaejut ▴ 30

Hi

there is a paper where the writer claims that they have implemented "Median", "Quantile", "TMM" and "Total" normalization methods in their R package. But I don't find their normalization methods similar to those being referenced. For example, the library size estimation in "Median" normalization in their package takes the form enter image description here

while in the reference the method takes this form:

enter image description here

These two normalization methods are different. But they are both known as Median normalization methods and the first one is referenced to the second one. How do you distinguish them when someone says they have used Median normalization? The same thing is about the other normalization methods too.

differential analysis • 2.1k views
ADD COMMENT
0
Entering edit mode

hougiotaejut : Please follow directions in this post to post your images so they are rendered inline. How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Thank you. Yeah I was trying to fix it when I saw they weren't shown.

ADD REPLY
0
Entering edit mode

I just did that for you. but you should pay attention to what genomax said.

ADD REPLY
0
Entering edit mode

Thanks a lot. I was trying to fix it then I saw you did the favor to me.

ADD REPLY
0
Entering edit mode
5.8 years ago

Those two methods are identical, they just look different because the first one splits the m samples into D groups of n_d each. The median in the second case is simply m_id in the first equation.

Having said that I find the second formulation (not surprisingly from Anders and Huber) to be easier to follow, since you really don't care about group information when doing library normalization.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your response. I noticed that the first one is the ratio of medians while the second one is median of ratios. I still feel a little confused as you say these are the same. I guess they give different amounts when applied on data. because if you, first, obtain the median of counts across genes for sample i and then divide it by geometric mean of all medians for all samples across all genes, you're given a different number from when you first divide each count for sample i gene j by the geometric mean of of them and then obtain the median. I'm not sure if I could ask my question properly.

ADD REPLY

Login before adding your answer.

Traffic: 2623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6