Question: Scale Data Before Drawing Heatmap Or Using Heatmap(..., Scale="Columan") In R?
2
gravatar for C Shao
7.5 years ago by
C Shao130
C Shao130 wrote:

Hi everyone,

The "scale" in heatmap confuses me. Scale data before drawing heatmap and use heatmap(...,scale="XXX",...) get different results.

For example:

mtscaled <- as.matrix(scale(mtcars))

heatmap(mtscaled, scale='none') 

xx.1 <- as.matrix(mtcars)

heatmap(xx.1, scale='column')

produce different clustering results.

Does anyone have an idea about this? If both ways are reasonable, which one should I choose?

Thanks a lot!

R heatmap • 20k views
ADD COMMENTlink modified 7.5 years ago by Michael Dondrup46k • written 7.5 years ago by C Shao130

the one that better show you the data :)

ADD REPLYlink written 7.5 years ago by Vladimir Chupakhin520
5
gravatar for Michael Dondrup
7.5 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

The difference is that in heatmap, the scaling is done after the dendrogram is computed, the code in heatmap doesn't use scale but the numeric results are the same. For data on very different scales(e.g.horse power, number of cylinders) it might be better to scale before clustering as you did. When you look at the column dendrogram of using scale before heatmap, it looks more sensible to me.

For your interest, this is the code from heatmap, that does the scaling:

if (scale == "row") {
    x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 1L, sd, na.rm = na.rm)
    x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
}
else if (scale == "column") {
    x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 2L, sd, na.rm = na.rm)
    x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
}

You will find that it comes after the code that does the clustering.

Edit: One more idea I had: it might be a good idea to scale both, rows and columns before the analysis, which is not possible using heatmap.

While scale centers and scales columns, it can be used easily to scale both by using:

x <- scale(x) # scale and center columns
x <- t(scale(t(x))) # scale and center rows
ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Michael Dondrup46k

Thanks for the answer, but I am still confusing. In the beginning of heatmap code, there is "scale <- if (symm && missing(scale)) "none" else match.arg(scale)". Does this command scale the data? And if the clustering do not use scaled data, what is the meaning of scale option in heatmap?

ADD REPLYlink written 7.5 years ago by C Shao130

That code only checks if the the parameter is set correctly. The scaling still has effect on the graphic output by scaling such that the color choice is improved, otherwise the color choice would be dominated by the extreme values (try plotting without scaling, everything is red). I agree, it is not easy to what else the benefit of scaling after clustering would be. I think it's often better to scale before clustering.

ADD REPLYlink written 7.5 years ago by Michael Dondrup46k

Thanks very much for this explanation, it really helps.

ADD REPLYlink written 7.5 years ago by C Shao130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1938 users visited in the last hour