Normalization methods in RNA-seq read counts data
1
0
Entering edit mode
4.1 years ago
hougiotaejut ▴ 20

Hi

there is a paper where the writer claims that they have implemented "Median", "Quantile", "TMM" and "Total" normalization methods in their R package. But I don't find their normalization methods similar to those being referenced. For example, the library size estimation in "Median" normalization in their package takes the form

while in the reference the method takes this form:

These two normalization methods are different. But they are both known as Median normalization methods and the first one is referenced to the second one. How do you distinguish them when someone says they have used Median normalization? The same thing is about the other normalization methods too.

differential analysis • 1.4k views
0
Entering edit mode

hougiotaejut : Please follow directions in this post to post your images so they are rendered inline. How to add images to a Biostars post

0
Entering edit mode

Thank you. Yeah I was trying to fix it when I saw they weren't shown.

0
Entering edit mode

I just did that for you. but you should pay attention to what genomax said.

0
Entering edit mode

Thanks a lot. I was trying to fix it then I saw you did the favor to me.

0
Entering edit mode
4.1 years ago

Those two methods are identical, they just look different because the first one splits the m samples into D groups of n_d each. The median in the second case is simply m_id in the first equation.

Having said that I find the second formulation (not surprisingly from Anders and Huber) to be easier to follow, since you really don't care about group information when doing library normalization.

0
Entering edit mode

Thanks a lot for your response. I noticed that the first one is the ratio of medians while the second one is median of ratios. I still feel a little confused as you say these are the same. I guess they give different amounts when applied on data. because if you, first, obtain the median of counts across genes for sample i and then divide it by geometric mean of all medians for all samples across all genes, you're given a different number from when you first divide each count for sample i gene j by the geometric mean of of them and then obtain the median. I'm not sure if I could ask my question properly.