Scale Data Before Drawing Heatmap Or Using Heatmap(..., Scale="Columan") In R?
1
3
Entering edit mode
10.0 years ago
C Shao ▴ 140

Hi everyone,

The "scale" in heatmap confuses me. Scale data before drawing heatmap and use heatmap(...,scale="XXX",...) get different results.

For example:

mtscaled <- as.matrix(scale(mtcars))

heatmap(mtscaled, scale='none') 

xx.1 <- as.matrix(mtcars)

heatmap(xx.1, scale='column')

produce different clustering results.

Does anyone have an idea about this? If both ways are reasonable, which one should I choose?

Thanks a lot!

heatmap r • 34k views
ADD COMMENT
0
Entering edit mode

the one that better show you the data :)

ADD REPLY
9
Entering edit mode
10.0 years ago

The difference is that in heatmap, the scaling is done after the dendrogram is computed, the code in heatmap doesn't use scale but the numeric results are the same. For data on very different scales(e.g.horse power, number of cylinders) it might be better to scale before clustering as you did. When you look at the column dendrogram of using scale before heatmap, it looks more sensible to me.

For your interest, this is the code from heatmap, that does the scaling:

if (scale == "row") {
    x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 1L, sd, na.rm = na.rm)
    x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
}
else if (scale == "column") {
    x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 2L, sd, na.rm = na.rm)
    x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
}

You will find that it comes after the code that does the clustering.

Edit: One more idea I had: it might be a good idea to scale both, rows and columns before the analysis, which is not possible using heatmap.

While scale centers and scales columns, it can be used easily to scale both by using:

x <- scale(x) # scale and center columns
x <- t(scale(t(x))) # scale and center rows
ADD COMMENT
0
Entering edit mode

Thanks for the answer, but I am still confusing. In the beginning of heatmap code, there is "scale <- if (symm && missing(scale)) "none" else match.arg(scale)". Does this command scale the data? And if the clustering do not use scaled data, what is the meaning of scale option in heatmap?

ADD REPLY
1
Entering edit mode

That code only checks if the the parameter is set correctly. The scaling still has effect on the graphic output by scaling such that the color choice is improved, otherwise the color choice would be dominated by the extreme values (try plotting without scaling, everything is red). I agree, it is not easy to what else the benefit of scaling after clustering would be. I think it's often better to scale before clustering.

ADD REPLY
0
Entering edit mode

Thanks very much for this explanation, it really helps.

ADD REPLY

Login before adding your answer.

Traffic: 2295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6