Question: How to do scaling in ComplexHeatmap
1
gravatar for sophialovechan
29 days ago by
sophialovechan40 wrote:

Hello everyone,

I am using edgeR and ComplexHeatmap trying to plot a Heatmap with a few ATAC-seq datasets. But I came across an issue that I can't solve. So I will really appreciate if you can give me some suggestions.

When I used log transformed CPM value to plot the Heatmap, the clustering is not very clear. The Heatmap is either all in blue or all in red.

When I used "mat_scaled = t(scale(t(data)))" to scale my data before plotting, some information can't be shown in the Heatmap. Like I expected there should be values that are the same in all the samples, which should be shown in the same color across the samples. But unfortunately, after scaling, the similar values scaled to larger differences, which show different color in the Heatmap.

                                                 sample1  sample2.   sample3
chr4_185974589_185974741       1.483681 1.472528    1.4296474
 after scaling
                                                  sample1             sample2.            sample3
chr4_185974589_185974741    0.761687321 0.37073755  -1.132424873

Thanks.

complexheatmap edger chip-seq R • 215 views
ADD COMMENTlink modified 28 days ago by zx87547.5k • written 29 days ago by sophialovechan40

It might be easier for us to assist you if you posted the example images and the corresponding code. For example, it is not clear if you want to scale columns or rows or both.

ADD REPLYlink written 29 days ago by Friederike4.2k

Here is the code for scaled heatmap

library(ComplexHeatmap)
library(circlize)
base_mean = rowMeans(data)
mat_scaled = t(scale(t(data)))
type = gsub("s\\d+_", "", colnames(data))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat_scaled , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 2), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

enter image description here

And unscaled

Heatmap(data , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 5), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

enter image description here

Thanks.

ADD REPLYlink modified 27 days ago by RamRS22k • written 29 days ago by sophialovechan40

And what exactly is the detail you don't like about the z-score? Which values do you think should be "the same"?

From what I can tell, your code does what you instructed it to do -- you're z-score-transforming the rows of your matrix, i.e. instead of displaying the actual values of data, you're coloring the heatmap based on the distance of each entry to its row's mean.

Btw, I would strongly recommend to not create an object named "type" because that's also the name of a base R function.

ADD REPLYlink written 29 days ago by Friederike4.2k

I got some values changed after scaling like this.

                               sample1    sample2.    sample3
chr4_185974589_185974741       1.483681   1.472528    1.4296474

after scaling

                               sample1        sample2.        sample3
chr4_185974589_185974741       0.761687321    0.37073755      -1.132424873

Peaks like this not are not changed across the samples but after scaling, it showed differences.

ADD REPLYlink modified 27 days ago by RamRS22k • written 29 days ago by sophialovechan40

I expected there should be some common regions (similar or same values) across my samples showing as the same color in a cluster.

ADD REPLYlink written 29 days ago by sophialovechan40

Check out the formula below. The z-score is going to drastically reduce the influence of the dynamic range differences between individual rows, therefore small differences in a row with overall small values may get similar z-scores as differences that seem larger to you just because the numbers that are compared to each other live on a different scale. But relatively speaking, the differences from the mean may not be as dramatic (or similarly dramatic in the low-value-ranges).

ADD REPLYlink written 29 days ago by Friederike4.2k

Hello sophialovechan,

You have added multiple images improperly, hence they show up as links and not as embedded images. Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here).

I will make the necessary changes for now.

ADD REPLYlink written 27 days ago by RamRS22k
2
gravatar for Friederike
29 days ago by
Friederike4.2k
United States
Friederike4.2k wrote:

Ah, I see it now. I had ignored the numbers before, I now formatted them in your original post so that it's a bit more obvious what you're actually asking. Your code does what it should; you may not like the consequences, but that's a different issue.

Using the pheatmap:::scale_rows function may illuminate what's going on:

## this is what the function does
> pheatmap:::scale_rows
function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## and this is the result
> matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% pheatmap:::scale_rows()
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

## which is the same as your code
>  matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% t %>% scale %>%  t
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

As explained e.g. in wikipedia, the z-score is calculated by subtracting the (column or row mean) from the given value and dividing that by the standard deviation.

The mean of your 3 example values is 1.46, the sd is 0.029, so do the math yourself and you can see that the code is doing what it's supposed to be doing.

ADD COMMENTlink modified 29 days ago • written 29 days ago by Friederike4.2k

Thanks for your detailed explanation. I understand the code is doing what it is supposed to do. So I guess the question I want to ask is if there is a way to present the data more close to the original data. I guess I should plot the original data in the Heatmap but I can't get the color to show the difference, as shown in the unscaled Heatmap in my previous reply.

ADD REPLYlink written 29 days ago by sophialovechan40

I cannot follow. Which differences do you find worthy of being "shown"? There's a clear difference in the second cluster, for example (one blue, two red).

Just a couple of thoughts:

  • are your values log-transformed? If not, that might help.
  • note that your current legend label "z-score" is wrong for the unscaled heatmap
  • maybe you're looking to adjust the color scheme? Kamil's pheatmap tutorial has a nice section about coloring according to quantiles; the principles should work with complexHeatmap, too, I would think
ADD REPLYlink written 29 days ago by Friederike4.2k

Yes, the value is log-transformed. I don't have problems with z-score scale. I think z-score scale show the difference very well. But I want to show the similarity as well, like the genes/values are similar or equal in all the samples.

ADD REPLYlink written 29 days ago by sophialovechan40

Sorry, but I'm not sure what you're asking for now. Maybe you can manually draw a version the way you envision it?

ADD REPLYlink written 28 days ago by Friederike4.2k

something like this:

Desired Heatmap Style

you can see a cluster (cluster2) which have the same or similar values across the samples.

That's what I want to make. Thank you very much.

ADD REPLYlink modified 27 days ago by RamRS22k • written 28 days ago by sophialovechan40

you're unscaled version has that, no?

ADD REPLYlink written 27 days ago by Friederike4.2k

I have a feeling OP wants column clustering (as opposed to the row clustering shown in their cluster2 example)

ADD REPLYlink written 27 days ago by RamRS22k

Clustering is happening at both levels in the example heatmaps shown in the original post. But maybe sophialovechan is asking for the clustering being based on the unscaled data while the colors should correspond to the z-score-transformed values?

ADD REPLYlink written 27 days ago by Friederike4.2k

Yes, It looks weird when OP picks the limits for colors themselves (c(-2.0.2)/c(-2,0,5) for low, mid, high) - ideally, they should be done using min(), mean() and max() (sort of like scaling without the actual scaling).

ADD REPLYlink written 27 days ago by RamRS22k

Yes. That's exactly what I want to do. I am not very good at coding so not sure how to set the color using min(), mean() and max(). And I mislabeled the heatmap without scaling with z-score. Sorry about the confusion.

ADD REPLYlink written 27 days ago by sophialovechan40
2
gravatar for RamRS
27 days ago by
RamRS22k
Houston, TX
RamRS22k wrote:

Change

colorRamp2(c(-2,0,5), ...

in the unscaled version to

colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T)), ...

That replaces the hard-coded values with values computed on the fly.

ADD COMMENTlink modified 27 days ago • written 27 days ago by RamRS22k

Thank you very much!

ADD REPLYlink written 27 days ago by sophialovechan40
> Heatmap(data, name="log(CPM)", km=5, col=colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T),c("blue", "white", "red")), bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE))

Error in colorRamp2(c(min(data, na.rm = T), mean(data, na.rm = T), max(data,  : 
unused arguments (bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

I got an error though.

ADD REPLYlink modified 27 days ago by RamRS22k • written 27 days ago by sophialovechan40

Check your parentheses.

ADD REPLYlink written 27 days ago by RamRS22k

Problem solved. Thanks.

ADD REPLYlink written 27 days ago by sophialovechan40

Please remember to accept the answer that helped solve your problem.

ADD REPLYlink written 27 days ago by Friederike4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour