Question

Scale() function, is it always necessary?

1

Entering edit mode

4.0 years ago

innyus ▴ 40

Hi all!

I am a beginner in R. I am currently working on gene expression data (Nanostring). The data is normalized, house-keeping genes and low abundance have been removed and the data have been log2 transformed.

I have done DE analysis with limma.

Question: do I need to use scale(data) before running the analysis? I tried both scaling and without and the result are pretty similair but there are some differences. Same goes when I did PCA, the results rate almost same but not quite.

Main question: does one always have to use scale() on the data (even normalized, log2 transformed)? Can scale() ever ”damage” the results/data?

gene scaling differential expression • 2.0k views

ADD COMMENT • link 4.0 years ago by innyus ▴ 40

1

Entering edit mode

My take on it is that it's easier to interpret the beta values resulting from a linear model when they're scaled. In linear models it shouldn't affect the results.

ADD REPLY • link 4.0 years ago by Asaf 10k

1

Entering edit mode

The coefficients are comparable when standardized but their interpretation is not simpler because the relation to the original measurement unit is lost, e.g. interpretation of a coefficient on a weight scale is interpreted as how much a change in outcome is brought about by a change in 1 kg in the measured mass, the standardized version is interpreted as how much the outcome would change for a 1 standard deviation change in mass.

ADD REPLY • link 4.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes, that makes sense.

ADD REPLY • link 4.0 years ago by Asaf 10k

1

Entering edit mode

Just a side comment, as I have processed many NanoString datasets: the approach that you're using is not incorrect; however, NanoString is a count-based method and the analysis may therefore be more conducive to the use of EdgeR or DESeq2, i.e., start from the raw NanoString counts and normalise these via EdgeR or DESeq2. One can even specify housekeeper genes, in this scenario. nSolver is also a [free] Windows-based GUI that can process NanoString data.

ADD REPLY • link 4.0 years ago by Kevin Blighe 87k

score 1 · Answer 1 · 2020-04-22

The scale() function with default parameters standardizes variables. By having all variables have a mean of 0 and a standard deviation of 1, you put them all on the same scale. Standardization almost never hurts. It's sometimes necessary for example if you want to be able to compare the coefficients of a regression model. There are a few situations where you may not want to standardize such as when the variable doesn't approximately follow a Gaussian distribution or when it would get in the way of interpretation (of e.g. the coefficients of a regression model). Sometimes standardization can actually increase noise for example when the measurement noise is relatively stable but the measured values vary a lot then noise will be higher relative to low values compared to high values.