Question

Complex Heatmap - Reordering clusters using cluster information from second matrix

0

Entering edit mode

3.9 years ago

janina_lueders • 0

Hi everyone, I have a question regarding the Complex Heatmap. I am aware of the post about reordering Complex Heatmap: Changing order of clusters . This does not solve my problem.

I have a dataframe containing column with genes 1-4, and lots of rows containing proteins. In the cells are numbers 0-3, corresponding to 2 different methods in experiments. 0 means the gene was not found, 1 and 2 correspond to 2 different methods, and 3 indicates it was found using both methods.

I am looking for a heatmap that can

a) be split into 4 slices according to in how many genes it is found (<0)

b) that can be sorted in a way that those proteins which are found in all genes (columns) are on top, then 3, then 2,1. There is no 0.

c) show with color in which method it was found (0-4)

d) contains a dendrogram clustering within the sections mentioned, to see the clusters within those slices.

Now, since it´s a complex problem, Complex Heatmap should be appropriate.

What I am doing so far:

My initial dataframe looks like this, just way more rows:

a   b   c   d  
3   2   2   2   
3   2   2   2   
3   0   0   0    
1   0   3   3

to achieve the heatmap being split into 4 sections I am transforming to a matrix containing only 1 if >0 and 0=0, then using rowsum to get a value indicating in how many genes it was found. I am saving a copy of it to plot_matrix_frame, since i will need the original data later. (I had an additional column with names in the beginning, that is not shown in the example data, which i delete by the [,-1].)

plot_matrix_frame <- matrix_frame[,-1]

bol_matrix <- matrix_frame[,-1] >1
red_matrix_frame <- matrix_frame[,-1]
red_matrix_frame[bol_matrix] <- 1

class(red_matrix_frame) <- "numeric"
red_matrix_frame <- cbind(red_matrix_frame, rowSums(red_matrix_frame))

This leaves the new matrix like this. It is transformed to numeric.

a   b   c   d  V5
1   1   1   1   4
1   1   1   1   4
1   0   0   0   1 
1   0   1   1   3

I am then using the dendsort package to get better cluster sorting, and I am using method ward.D2 since kmeans only showed very insufficient clustering. I am using the red_matrix_frame with 1 and 0 to get clustering, and applying this to the heatmap function used on the original data (plot_matrix_frame). I am using the split function to get 4 sections.

`dend = dendsort(hclust(dist(red_matrix_frame), method="ward.D2"))

default.hmap <- Heatmap(plot_matrix_frame[,c(1,2,3,4)],split=4,
                        column_names_side="top", column_order= c("a","b","c","d"),
                        cluster_row_slices=FALSE,
                        #heatmap_legend_param= list(
                          #labels=c("none","Bi","C","Both")),
                        cluster_rows=dend,
                        col=c("white","blue","green","red"))

`

This leaves me with a matrix that has clustering by in how many columns the value is <0, it is showing the colors according to the experiment. But I cannot fullfill the sorting of the clusters in a way that all of the rows that have values >0 in 3 columns are in the second section, all with >0 in 2 columns are in 3rd slice, and all with only >0 in one column are in 4th slice.

My alternative approach leaves me with a heatmap that is nicely sliced according to my needs, but has no clustering in the slices and no dendrogram. I believe that the issue is that i am using clustering from a different matrix, but i am not aware of a way that will allow both and keep my colors according to the experiment, without including those into the clustering.

split <- paste0("Cl\n", kclus$cluster) 
default.hmap <- Heatmap(plot_matrix_frame[,c(1,2,3,4)],split=split,
column_names_side="top", column_order= c("a","b","c","d"),
cluster_row_slices=FALSE,
cluster_rows=T,
#heatmap_legend_param= list(
#labels=c("none","Bi","C","Both")),                       
col=c("white","blue","green","red"))

Any help is greatly appreciated! I have also thought about a possibility to alter the colors later, but I am a bit at a loss if that is at all possible.

Thank you!!

First and second heatmap provided as picture

R Complex Heatmap Heatmap • 2.8k views

ADD COMMENT • link 3.9 years ago by janina_lueders • 0

0

Entering edit mode

Have you tried supplying a function to the clustering_distance_row parameter? It takes a function that accepts a matrix and returns the distance based on custom operations. You might be able to tweak how CH calculates the distance between rows that way. Similarly, you can also supply a custom function to tweak how CH clusters rows. Either way, you'll need to operate on the matrix being plotted, or at least return a matrix with the same number of dimensions and dimension names as expected.

For example, try and get your function to output a matrix exactly like the output of dist(plot_matrix_frame[,1:4] (going by your example). Any function that returns output in that format will woork, no matter what matrix is used to calculate the actual distance. That's how you can side-step using different matrices to plot vs cluster vs calculate distance between rows.

ADD REPLY • link 3.9 years ago by Ram 43k