Question

How to do scaling in ComplexHeatmap

2

Entering edit mode

5.2 years ago

sophialovechan ▴ 80

Hello everyone,

I am using edgeR and ComplexHeatmap trying to plot a Heatmap with a few ATAC-seq datasets. But I came across an issue that I can't solve. So I will really appreciate if you can give me some suggestions.

When I used log transformed CPM value to plot the Heatmap, the clustering is not very clear. The Heatmap is either all in blue or all in red.

When I used "mat_scaled = t(scale(t(data)))" to scale my data before plotting, some information can't be shown in the Heatmap. Like I expected there should be values that are the same in all the samples, which should be shown in the same color across the samples. But unfortunately, after scaling, the similar values scaled to larger differences, which show different color in the Heatmap.

                                                 sample1  sample2.   sample3
chr4_185974589_185974741       1.483681 1.472528    1.4296474
 after scaling
                                                  sample1             sample2.            sample3
chr4_185974589_185974741    0.761687321 0.37073755  -1.132424873

Thanks.

ChIP-Seq ComplexHeatmap R edgeR • 26k views

ADD COMMENT • link updated 5.2 years ago by zx8754 11k • written 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

It might be easier for us to assist you if you posted the example images and the corresponding code. For example, it is not clear if you want to scale columns or rows or both.

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

Here is the code for scaled heatmap

library(ComplexHeatmap)
library(circlize)
base_mean = rowMeans(data)
mat_scaled = t(scale(t(data)))
type = gsub("s\\d+_", "", colnames(data))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat_scaled , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 2), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

![enter image description here][1]

And unscaled

Heatmap(data , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 5), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

![enter image description here][2]

Thanks.

ADD REPLY • link 4.6 years ago by sophialovechan ▴ 80

0

Entering edit mode

And what exactly is the detail you don't like about the z-score? Which values do you think should be "the same"?

From what I can tell, your code does what you instructed it to do -- you're z-score-transforming the rows of your matrix, i.e. instead of displaying the actual values of data, you're coloring the heatmap based on the distance of each entry to its row's mean.

Btw, I would strongly recommend to not create an object named "type" because that's also the name of a base R function.

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

I got some values changed after scaling like this.

                               sample1    sample2.    sample3
chr4_185974589_185974741       1.483681   1.472528    1.4296474

after scaling

                               sample1        sample2.        sample3
chr4_185974589_185974741       0.761687321    0.37073755      -1.132424873

Peaks like this not are not changed across the samples but after scaling, it showed differences.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

I expected there should be some common regions (similar or same values) across my samples showing as the same color in a cluster.

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

Check out the formula below. The z-score is going to drastically reduce the influence of the dynamic range differences between individual rows, therefore small differences in a row with overall small values may get similar z-scores as differences that seem larger to you just because the numbers that are compared to each other live on a different scale. But relatively speaking, the differences from the mean may not be as dramatic (or similarly dramatic in the low-value-ranges).

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

Hello sophialovechan,

You have added multiple images improperly, hence they show up as links and not as embedded images. Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here).

I will make the necessary changes for now.

ADD REPLY • link 5.2 years ago by Ram 44k

score 2 · Answer 1 · 2019-05-20

2

Entering edit mode

5.2 years ago

Friederike 9.0k

Ah, I see it now. I had ignored the numbers before, I now formatted them in your original post so that it's a bit more obvious what you're actually asking. Your code does what it should; you may not like the consequences, but that's a different issue.

Using the pheatmap:::scale_rows function may illuminate what's going on:

## this is what the function does
> pheatmap:::scale_rows
function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## and this is the result
> matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% pheatmap:::scale_rows()
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

## which is the same as your code
>  matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% t %>% scale %>%  t
          [,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

As explained e.g. in wikipedia, the z-score is calculated by subtracting the (column or row mean) from the given value and dividing that by the standard deviation.

The mean of your 3 example values is 1.46, the sd is 0.029, so do the math yourself and you can see that the code is doing what it's supposed to be doing.

ADD COMMENT • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

Thanks for your detailed explanation. I understand the code is doing what it is supposed to do. So I guess the question I want to ask is if there is a way to present the data more close to the original data. I guess I should plot the original data in the Heatmap but I can't get the color to show the difference, as shown in the unscaled Heatmap in my previous reply.

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

I cannot follow. Which differences do you find worthy of being "shown"? There's a clear difference in the second cluster, for example (one blue, two red).

Just a couple of thoughts:

are your values log-transformed? If not, that might help.
note that your current legend label "z-score" is wrong for the unscaled heatmap
maybe you're looking to adjust the color scheme? Kamil's pheatmap tutorial has a nice section about coloring according to quantiles; the principles should work with complexHeatmap, too, I would think

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

Yes, the value is log-transformed. I don't have problems with z-score scale. I think z-score scale show the difference very well. But I want to show the similarity as well, like the genes/values are similar or equal in all the samples.

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

Sorry, but I'm not sure what you're asking for now. Maybe you can manually draw a version the way you envision it?

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

something like this:

![Desired Heatmap Style][1]

you can see a cluster (cluster2) which have the same or similar values across the samples.

That's what I want to make. Thank you very much.

ADD REPLY • link 4.6 years ago by sophialovechan ▴ 80

0

Entering edit mode

you're unscaled version has that, no?

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

I have a feeling OP wants column clustering (as opposed to the row clustering shown in their cluster2 example)

ADD REPLY • link 5.2 years ago by Ram 44k

0

Entering edit mode

Clustering is happening at both levels in the example heatmaps shown in the original post. But maybe sophialovechan is asking for the clustering being based on the unscaled data while the colors should correspond to the z-score-transformed values?

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

Yes, It looks weird when OP picks the limits for colors themselves (c(-2.0.2)/c(-2,0,5) for low, mid, high) - ideally, they should be done using min(), mean() and max() (sort of like scaling without the actual scaling).

ADD REPLY • link 5.2 years ago by Ram 44k

0

Entering edit mode

Yes. That's exactly what I want to do. I am not very good at coding so not sure how to set the color using min(), mean() and max(). And I mislabeled the heatmap without scaling with z-score. Sorry about the confusion.

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

Ram · Answer 2 · 2019-05-22

2

Entering edit mode

5.2 years ago

Ram 44k

Change

colorRamp2(c(-2,0,5), ...

in the unscaled version to

colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T)), ...

That replaces the hard-coded values with values computed on the fly.

ADD COMMENT • link 5.2 years ago by Ram 44k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

> Heatmap(data, name="log(CPM)", km=5, col=colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T),c("blue", "white", "red")), bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE))

Error in colorRamp2(c(min(data, na.rm = T), mean(data, na.rm = T), max(data,  : 
unused arguments (bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)

I got an error though.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

Check your parentheses.

ADD REPLY • link 5.2 years ago by Ram 44k

0

Entering edit mode

Problem solved. Thanks.

ADD REPLY • link 5.2 years ago by sophialovechan ▴ 80

0

Entering edit mode

Please remember to accept the answer that helped solve your problem.

ADD REPLY • link 5.2 years ago by Friederike 9.0k

0

Entering edit mode

hi,I have same question.my data is below. unscaled

Symbol  wt_1    wt_2    wt_3    sample2_1   sample2_2   sample2_3
gene1   5.335232251 5.740785039 5.902108135 3.992369652 3.891350026 4.270974093
gene2   15.98835431 16.02484119 16.02575545 16.00483162 16.02241575 15.99473185

It is obvious that gene1 is difference and gene2 is no difference.But by scaled the data is below

            wt_1      wt_2     wt_3  sample2_1  sample2_2  sample2_3
gene1  0.5281683 0.9746396 1.152239 -0.9501832 -1.0613953 -0.6434688
gene2 -1.3263240 0.8934847 0.949107 -0.3238684  0.7459245 -0.9383238

the gene2 is difference.why?and how can I plot the data as i expect that gene1 is difference and gene2 is no difference?

Thanks .

ADD REPLY • link 4.7 years ago by yiren ▴ 10

0

Entering edit mode

How exactly did you scale your data?
Do you expect gene1 to be different or do you know it for a fact?

ADD REPLY • link 4.7 years ago by Ram 44k

0

Entering edit mode

thank you very much for your reply. 1.I scale my data by t(scale(t(data))). 2.I know gene1 is difference because its foldchange is bigger than 2.what I want to do is that gene2 should be shown in the same color across the samples,gene1 shoud be showm in difference color across the samples.

ADD REPLY • link 4.7 years ago by yiren ▴ 10

0

Entering edit mode

That is not the right approach - you should go where the data leads you and visualize that, not try to make the data show you what you think is the right answer.

ADD REPLY • link 4.7 years ago by Ram 44k