How to do scaling in ComplexHeatmap
2
1
Entering edit mode
3.8 years ago

Hello everyone,

I am using edgeR and ComplexHeatmap trying to plot a Heatmap with a few ATAC-seq datasets. But I came across an issue that I can't solve. So I will really appreciate if you can give me some suggestions.

When I used log transformed CPM value to plot the Heatmap, the clustering is not very clear. The Heatmap is either all in blue or all in red.

When I used "mat_scaled = t(scale(t(data)))" to scale my data before plotting, some information can't be shown in the Heatmap. Like I expected there should be values that are the same in all the samples, which should be shown in the same color across the samples. But unfortunately, after scaling, the similar values scaled to larger differences, which show different color in the Heatmap.

                                                 sample1  sample2.   sample3
chr4_185974589_185974741       1.483681 1.472528    1.4296474
after scaling
sample1             sample2.            sample3
chr4_185974589_185974741    0.761687321 0.37073755  -1.132424873


Thanks.

ChIP-Seq ComplexHeatmap R edgeR • 18k views
0
Entering edit mode

It might be easier for us to assist you if you posted the example images and the corresponding code. For example, it is not clear if you want to scale columns or rows or both.

0
Entering edit mode

Here is the code for scaled heatmap

library(ComplexHeatmap)
library(circlize)
base_mean = rowMeans(data)
mat_scaled = t(scale(t(data)))
type = gsub("s\\d+_", "", colnames(data))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat_scaled , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 2), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)


![enter image description here][1]

And unscaled

Heatmap(data , name="Z-score", km=5, col=colorRamp2(c(-2, 0, 5), c("blue", "white", "red")),bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)


![enter image description here][2]

Thanks.

0
Entering edit mode

And what exactly is the detail you don't like about the z-score? Which values do you think should be "the same"?

From what I can tell, your code does what you instructed it to do -- you're z-score-transforming the rows of your matrix, i.e. instead of displaying the actual values of data, you're coloring the heatmap based on the distance of each entry to its row's mean.

Btw, I would strongly recommend to not create an object named "type" because that's also the name of a base R function.

0
Entering edit mode

I got some values changed after scaling like this.

                               sample1    sample2.    sample3
chr4_185974589_185974741       1.483681   1.472528    1.4296474


after scaling

                               sample1        sample2.        sample3
chr4_185974589_185974741       0.761687321    0.37073755      -1.132424873


Peaks like this not are not changed across the samples but after scaling, it showed differences.

0
Entering edit mode

I expected there should be some common regions (similar or same values) across my samples showing as the same color in a cluster.

0
Entering edit mode

Check out the formula below. The z-score is going to drastically reduce the influence of the dynamic range differences between individual rows, therefore small differences in a row with overall small values may get similar z-scores as differences that seem larger to you just because the numbers that are compared to each other live on a different scale. But relatively speaking, the differences from the mean may not be as dramatic (or similarly dramatic in the low-value-ranges).

0
Entering edit mode

Hello sophialovechan,

You have added multiple images improperly, hence they show up as links and not as embedded images. Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here).

I will make the necessary changes for now.

2
Entering edit mode
3.8 years ago

Ah, I see it now. I had ignored the numbers before, I now formatted them in your original post so that it's a bit more obvious what you're actually asking. Your code does what it should; you may not like the consequences, but that's a different issue.

Using the pheatmap:::scale_rows function may illuminate what's going on:

## this is what the function does
> pheatmap:::scale_rows
function (x)
{
m = apply(x, 1, mean, na.rm = T)
s = apply(x, 1, sd, na.rm = T)
return((x - m)/s)
}

## and this is the result
> matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% pheatmap:::scale_rows()
[,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423

## which is the same as your code
>  matrix(c(1.483681, 1.472528, 1.4296474), ncol = 3)  %>% t %>% scale %>%  t
[,1]      [,2]      [,3]
[1,] 0.7616927 0.3707308 -1.132423


As explained e.g. in wikipedia, the z-score is calculated by subtracting the (column or row mean) from the given value and dividing that by the standard deviation.

The mean of your 3 example values is 1.46, the sd is 0.029, so do the math yourself and you can see that the code is doing what it's supposed to be doing.

0
Entering edit mode

Thanks for your detailed explanation. I understand the code is doing what it is supposed to do. So I guess the question I want to ask is if there is a way to present the data more close to the original data. I guess I should plot the original data in the Heatmap but I can't get the color to show the difference, as shown in the unscaled Heatmap in my previous reply.

0
Entering edit mode

I cannot follow. Which differences do you find worthy of being "shown"? There's a clear difference in the second cluster, for example (one blue, two red).

Just a couple of thoughts:

• are your values log-transformed? If not, that might help.
• note that your current legend label "z-score" is wrong for the unscaled heatmap
• maybe you're looking to adjust the color scheme? Kamil's pheatmap tutorial has a nice section about coloring according to quantiles; the principles should work with complexHeatmap, too, I would think
0
Entering edit mode

Yes, the value is log-transformed. I don't have problems with z-score scale. I think z-score scale show the difference very well. But I want to show the similarity as well, like the genes/values are similar or equal in all the samples.

0
Entering edit mode

Sorry, but I'm not sure what you're asking for now. Maybe you can manually draw a version the way you envision it?

0
Entering edit mode

something like this:

![Desired Heatmap Style][1]

you can see a cluster (cluster2) which have the same or similar values across the samples.

That's what I want to make. Thank you very much.

0
Entering edit mode

you're unscaled version has that, no?

0
Entering edit mode

I have a feeling OP wants column clustering (as opposed to the row clustering shown in their cluster2 example)

0
Entering edit mode

Clustering is happening at both levels in the example heatmaps shown in the original post. But maybe sophialovechan is asking for the clustering being based on the unscaled data while the colors should correspond to the z-score-transformed values?

0
Entering edit mode

Yes, It looks weird when OP picks the limits for colors themselves (c(-2.0.2)/c(-2,0,5) for low, mid, high) - ideally, they should be done using min(), mean() and max() (sort of like scaling without the actual scaling).

0
Entering edit mode

Yes. That's exactly what I want to do. I am not very good at coding so not sure how to set the color using min(), mean() and max(). And I mislabeled the heatmap without scaling with z-score. Sorry about the confusion.

2
Entering edit mode
3.8 years ago
Ram 38k

Change

colorRamp2(c(-2,0,5), ...


in the unscaled version to

colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T)), ...


That replaces the hard-coded values with values computed on the fly.

0
Entering edit mode

Thank you very much!

0
Entering edit mode
> Heatmap(data, name="log(CPM)", km=5, col=colorRamp2(c(min(data, na.rm=T), mean(data, na.rm=T), max(data, na.rm=T),c("blue", "white", "red")), bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE))

Error in colorRamp2(c(min(data, na.rm = T), mean(data, na.rm = T), max(data,  :
unused arguments (bottom_annotation = ha, bottom_annotation_height = unit(4, "mm"), show_row_names = FALSE, show_column_names = FALSE)


I got an error though.

0
Entering edit mode

0
Entering edit mode

Problem solved. Thanks.

0
Entering edit mode

0
Entering edit mode

hi,I have same question.my data is below. unscaled

Symbol  wt_1    wt_2    wt_3    sample2_1   sample2_2   sample2_3
gene1   5.335232251 5.740785039 5.902108135 3.992369652 3.891350026 4.270974093
gene2   15.98835431 16.02484119 16.02575545 16.00483162 16.02241575 15.99473185


It is obvious that gene1 is difference and gene2 is no difference.But by scaled the data is below

            wt_1      wt_2     wt_3  sample2_1  sample2_2  sample2_3
gene1  0.5281683 0.9746396 1.152239 -0.9501832 -1.0613953 -0.6434688
gene2 -1.3263240 0.8934847 0.949107 -0.3238684  0.7459245 -0.9383238


the gene2 is difference.why?and how can I plot the data as i expect that gene1 is difference and gene2 is no difference?

Thanks .

0
Entering edit mode
1. How exactly did you scale your data?
2. Do you expect gene1 to be different or do you know it for a fact?
0
Entering edit mode

thank you very much for your reply. 1.I scale my data by t(scale(t(data))). 2.I know gene1 is difference because its foldchange is bigger than 2.what I want to do is that gene2 should be shown in the same color across the samples,gene1 shoud be showm in difference color across the samples.

0
Entering edit mode

That is not the right approach - you should go where the data leads you and visualize that, not try to make the data show you what you think is the right answer.