Question: Scale Data Before Drawing Heatmap Or Using Heatmap(..., Scale="Columan") In R?
2
C Shao130 wrote:

Hi everyone,

The "scale" in heatmap confuses me. Scale data before drawing heatmap and use heatmap(...,scale="XXX",...) get different results.

For example:

``````mtscaled <- as.matrix(scale(mtcars))

heatmap(mtscaled, scale='none')

xx.1 <- as.matrix(mtcars)

heatmap(xx.1, scale='column')
``````

produce different clustering results.

Does anyone have an idea about this? If both ways are reasonable, which one should I choose?

Thanks a lot!

R heatmap • 20k views
modified 7.5 years ago by Michael Dondrup46k • written 7.5 years ago by C Shao130

the one that better show you the data :)

5
Michael Dondrup46k wrote:

The difference is that in heatmap, the scaling is done after the dendrogram is computed, the code in heatmap doesn't use scale but the numeric results are the same. For data on very different scales(e.g.horse power, number of cylinders) it might be better to scale before clustering as you did. When you look at the column dendrogram of using scale before heatmap, it looks more sensible to me.

For your interest, this is the code from heatmap, that does the scaling:

``````if (scale == "row") {
x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
sx <- apply(x, 1L, sd, na.rm = na.rm)
x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
}
else if (scale == "column") {
x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
sx <- apply(x, 2L, sd, na.rm = na.rm)
x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
}
``````

You will find that it comes after the code that does the clustering.

Edit: One more idea I had: it might be a good idea to scale both, rows and columns before the analysis, which is not possible using heatmap.

While `scale` centers and scales columns, it can be used easily to scale both by using:

``````x <- scale(x) # scale and center columns
x <- t(scale(t(x))) # scale and center rows
``````

Thanks for the answer, but I am still confusing. In the beginning of heatmap code, there is "scale <- if (symm && missing(scale)) "none" else match.arg(scale)". Does this command scale the data? And if the clustering do not use scaled data, what is the meaning of scale option in heatmap?

That code only checks if the the parameter is set correctly. The scaling still has effect on the graphic output by scaling such that the color choice is improved, otherwise the color choice would be dominated by the extreme values (try plotting without scaling, everything is red). I agree, it is not easy to what else the benefit of scaling after clustering would be. I think it's often better to scale before clustering.