Question: How to do scaling in ComplexHeatmap
1
gravatar for sophialovechan
10 months ago by
sophialovechan50 wrote:

Hello everyone,

I am using edgeR and ComplexHeatmap trying to plot a Heatmap with a few ATAC-seq datasets. But I came across an issue that I can't solve. So I will really appreciate if you can give me some suggestions.

When I used log transformed CPM value to plot the Heatmap, the clustering is not very clear. The Heatmap is either all in blue or all in red.

When I used "mat_scaled = t(scale(t(data)))" to scale my data before plotting, some information can't be shown in the Heatmap. Like I expected there should be values that are the same in all the samples, which should be shown in the same color across the samples. But unfortunately, after scaling, the similar values scaled to larger differences, which show different color in the Heatmap.

                                                 sample1  sample2.   sample3
chr4_185974589_185974741       1.483681 1.472528    1.4296474
 after scaling
                                                  sample1             sample2.            sample3
chr4_185974589_185974741    0.761687321 0.37073755  -1.132424873

Thanks.

complexheatmap edger chip-seq R • 2.0k views
ADD COMMENTlink modified 10 months ago by zx87549.1k • written 10 months ago by sophialovechan50

It might be easier for us to assist you if you posted the example images and the corresponding code. For example, it is not clear if you want to scale columns or rows or both.

ADD REPLYlink written 10 months ago by Friederike5.4k

Here is the code for scaled heatmap

library(ComplexHeatmap)
library(circlize)
base_mean = rowMeans(data)
mat_scaled = t(scale(t(data)))
type = gsub("s\\d+_", "", colnames(data))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat_scaled , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 2), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

![enter image description here][1]

And unscaled

Heatmap(data , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 5), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

![enter image description here][2]

Thanks.

ADD REPLYlink modified 4 months ago • written 10 months ago by sophialovechan50

And what exactly is the detail you don't like about the z-score? Which values do you think should be "the same"?

From what I can tell, your code does what you instructed it to do -- you're z-score-transforming the rows of your matrix, i.e. instead of displaying the actual values of data, you're coloring the heatmap based on the distance of each entry to its row's mean.

Btw, I would strongly recommend to not create an object named "type" because that's also the name of a base R function.

ADD REPLYlink written 10 months ago by Friederike5.4k

I got some values changed after scaling like this.

                               sample1    sample2.    sample3
chr4_185974589_185974741       1.483681   1.472528    1.4296474

after scaling

                               sample1        sample2.        sample3
chr4_185974589_185974741       0.761687321    0.37073755      -1.132424873

Peaks like this not are not changed across the samples but after scaling, it showed differences.

ADD REPLYlink modified 10 months ago by RamRS26k • written 10 months ago by sophialovechan50

I expected there should be some common regions (similar or same values) across my samples showing as the same color in a cluster.

ADD REPLYlink written 10 months ago by sophialovechan50

Check out the formula below. The z-score is going to drastically reduce the influence of the dynamic range differences between individual rows, therefore small differences in a row with overall small values may get similar z-scores as differences that seem larger to you just because the numbers that are compared to each other live on a different scale. But relatively speaking, the differences from the mean may not be as dramatic (or similarly dramatic in the low-value-ranges).

ADD REPLYlink written 10 months ago by Friederike5.4k

Hello sophialovechan,

You have added multiple images improperly, hence they show up as links and not as embedded images. Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here).

I will make the necessary changes for now.

ADD REPLYlink written 10 months ago by RamRS26k
2
gravatar for Friederike
10 months ago by
Friederike5.4k
United States
Friederike5.4k wrote:

Ah, I see it now. I had ignored the numbers before, I now formatted them in your original post so that it's a bit more obvious what you're actually asking. Your code does what it should; you may not like the consequences, but that's a different issue.

Using the pheatmap:::scale_rows function may illuminate what's going on:

## this is what the function does
> pheatmap:::scale_rows
function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## and this is the result
> matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% pheatmap:::scale_rows()
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

## which is the same as your code
>  matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% t %>% scale %>%  t
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

As explained e.g. in wikipedia, the z-score is calculated by subtracting the (column or row mean) from the given value and dividing that by the standard deviation.

The mean of your 3 example values is 1.46, the sd is 0.029, so do the math yourself and you can see that the code is doing what it's supposed to be doing.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Friederike5.4k

Thanks for your detailed explanation. I understand the code is doing what it is supposed to do. So I guess the question I want to ask is if there is a way to present the data more close to the original data. I guess I should plot the original data in the Heatmap but I can't get the color to show the difference, as shown in the unscaled Heatmap in my previous reply.

ADD REPLYlink written 10 months ago by sophialovechan50

I cannot follow. Which differences do you find worthy of being "shown"? There's a clear difference in the second cluster, for example (one blue, two red).

Just a couple of thoughts:

  • are your values log-transformed? If not, that might help.
  • note that your current legend label "z-score" is wrong for the unscaled heatmap
  • maybe you're looking to adjust the color scheme? Kamil's pheatmap tutorial has a nice section about coloring according to quantiles; the principles should work with complexHeatmap, too, I would think
ADD REPLYlink written 10 months ago by Friederike5.4k

Yes, the value is log-transformed. I don't have problems with z-score scale. I think z-score scale show the difference very well. But I want to show the similarity as well, like the genes/values are similar or equal in all the samples.

ADD REPLYlink written 10 months ago by sophialovechan50

Sorry, but I'm not sure what you're asking for now. Maybe you can manually draw a version the way you envision it?

ADD REPLYlink written 10 months ago by Friederike5.4k

something like this:

![Desired Heatmap Style][1]

you can see a cluster (cluster2) which have the same or similar values across the samples.

That's what I want to make. Thank you very much.

ADD REPLYlink modified 4 months ago • written 10 months ago by sophialovechan50

you're unscaled version has that, no?

ADD REPLYlink written 10 months ago by Friederike5.4k

I have a feeling OP wants column clustering (as opposed to the row clustering shown in their cluster2 example)

ADD REPLYlink written 10 months ago by RamRS26k

Clustering is happening at both levels in the example heatmaps shown in the original post. But maybe sophialovechan is asking for the clustering being based on the unscaled data while the colors should correspond to the z-score-transformed values?

ADD REPLYlink written 10 months ago by Friederike5.4k

Yes, It looks weird when OP picks the limits for colors themselves (c(-2.0.2)/c(-2,0,5) for low, mid, high) - ideally, they should be done using min(), mean() and max() (sort of like scaling without the actual scaling).

ADD REPLYlink written 10 months ago by RamRS26k

Yes. That's exactly what I want to do. I am not very good at coding so not sure how to set the color using min(), mean() and max(). And I mislabeled the heatmap without scaling with z-score. Sorry about the confusion.

ADD REPLYlink written 10 months ago by sophialovechan50
2
gravatar for RamRS
10 months ago by
RamRS26k
Houston, TX
RamRS26k wrote:

Change

colorRamp2(c(-2,0,5), ...

in the unscaled version to

colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T)), ...

That replaces the hard-coded values with values computed on the fly.

ADD COMMENTlink modified 10 months ago • written 10 months ago by RamRS26k

Thank you very much!

ADD REPLYlink written 10 months ago by sophialovechan50
> Heatmap(data, name="log(CPM)", km=5, col=colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T),c("blue", "white", "red")), bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE))

Error in colorRamp2(c(min(data, na.rm = T), mean(data, na.rm = T), max(data,  : 
unused arguments (bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

I got an error though.

ADD REPLYlink modified 10 months ago by RamRS26k • written 10 months ago by sophialovechan50

Check your parentheses.

ADD REPLYlink written 10 months ago by RamRS26k

Problem solved. Thanks.

ADD REPLYlink written 10 months ago by sophialovechan50

Please remember to accept the answer that helped solve your problem.

ADD REPLYlink written 10 months ago by Friederike5.4k

hi,I have same question.my data is below. unscaled

Symbol  wt_1    wt_2    wt_3    sample2_1   sample2_2   sample2_3
gene1   5.335232251 5.740785039 5.902108135 3.992369652 3.891350026 4.270974093
gene2   15.98835431 16.02484119 16.02575545 16.00483162 16.02241575 15.99473185

It is obvious that gene1 is difference and gene2 is no difference.But by scaled the data is below

            wt_1      wt_2     wt_3  sample2_1  sample2_2  sample2_3
gene1  0.5281683 0.9746396 1.152239 -0.9501832 -1.0613953 -0.6434688
gene2 -1.3263240 0.8934847 0.949107 -0.3238684  0.7459245 -0.9383238

the gene2 is difference.why?and how can I plot the data as i expect that gene1 is difference and gene2 is no difference?

Thanks .

ADD REPLYlink modified 4 months ago • written 4 months ago by yiren0
  1. How exactly did you scale your data?
  2. Do you expect gene1 to be different or do you know it for a fact?
ADD REPLYlink written 4 months ago by RamRS26k

thank you very much for your reply. 1.I scale my data by t(scale(t(data))). 2.I know gene1 is difference because its foldchange is bigger than 2.what I want to do is that gene2 should be shown in the same color across the samples,gene1 shoud be showm in difference color across the samples.

ADD REPLYlink written 4 months ago by yiren0

That is not the right approach - you should go where the data leads you and visualize that, not try to make the data show you what you think is the right answer.

ADD REPLYlink written 4 months ago by RamRS26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1615 users visited in the last hour