Question: What is the relationship between library size and normalization factor?
2
3.3 years ago by
Deepak Tanwar4.0k
ETH Zürich, Switzerland
Deepak Tanwar4.0k wrote:

I saw this plot

doi.org/10.3389/fgene.2016.00164

Normalization factors for the fruit set RNA-Seq data depending on corresponding library sizes. All three studied normalization methods are carried out with default settings. For all three methods, regression (dashed) lines are estimated from a simple linear regression modeling the relationship between default normalization factors and library sizes. Color key: TMM, RLE, and MRN are respectively colored in green, blue, and red. Key to symbols: Bud, Ant, and Pos stages are respectively drawn with circles, squares, and triangles.

Question: What is the relationship between `library size` and `normalization factor`? What does it mean if the `regression line` have `R^2` of `0.9`?

modified 2.1 years ago by elie.maza0 • written 3.3 years ago by Deepak Tanwar4.0k
1
3.3 years ago by
Santosh Anand5.1k
Santosh Anand5.1k wrote:

Q: What is the relationship between library size and normalization factor?

"Indeed, it is known that TMM normalization factors do not take into account library sizes. This fact is illustrated in Figure 1 by an almost horizontal regression line. On the contrary, RLE and MRN factors are closer to each other, and share a positive correlation with the library size."

Q: What does it mean if the regression line have R^2 of 0.9? A regression (linear regression here) R2 tells how good the curve (here line) fits is to your data. If all the data are on line, R2 = 100. You can also think this in term of correlation. Correlation means "how good" one variable can be predicted from another variable. In fact, the goodness of fit R^2 is numerically equal to the square of Pearson correlation (rho).

R2 = 0.9 => rho (Pearson correlation) = sqrt(0.9) = 0.94

By looking either of the numbers (R^2 or rho), you can conclude that there is a very good (linear) correlation among two variables and one can be almost perfectly predicted from other. By looking at the line (red or blue line, say), you can easily see that when one variable increases, the other too (in mathematical term, the slope of the line is +ve). This information is also conveyed by the sign (positive) of R^2.

1

Thank you Santosh Anand for your reply. I do understand what you wrote. But, what I intended to ask is, what does this mean?

I understand that there is very good (linear) correlation among two variables and on variable can be predicted from other. What's Biological interpretation?

0
2.1 years ago by
elie.maza0 wrote:

Some normalization methods take into account the libray size in the calculation of their normalization factors, and other methods do not. That is the difference between RLE and MRN methods on the one side, and TMM an the other side. Nevertheless, the egdeR package (which uses TMM) also take into account the library size to normalize but this do not appear in their "normalization factors".

Finally, the correlation coefficient hasn't really a "biological" meaning but a "statistical" one. Indeed, it only shows that some normalization factors are linked with the library size and others are not.