Retrieve Clusters From ComplexHeatmap
1
3
Entering edit mode
3.6 years ago
pthom010 ▴ 40

I made a heatmap with the following data (the actual table has 240 rows):

tf.log


                              0-dpi      1-dpi     7-dpi    14-dpi      22-dpi
TRINITY_DN0_c0_g1_i2      1.27584408  0.5872404  1.780178  1.414362  1.53341851
TRINITY_DN10214_c0_g1_i2 -2.34774378 -2.9194079 -3.211677 -2.885869 -2.04617227
TRINITY_DN10214_c0_g1_i6 -2.14867876 -1.5794957 -1.577811 -2.485095 -1.44172768
TRINITY_DN1038_c0_g1_i4   0.03163921  0.7375222  2.037936  2.462830  0.04559793
TRINITY_DN10462_c0_g1_i2 -2.63973533 -2.8039350 -2.481144 -2.698932 -1.76284020
TRINITY_DN1052_c0_g1_i3  -3.32767605 -2.2006082 -1.887869 -1.211592 -1.43292669

I then made the following heatmap:

heat.gen.k = Heatmap(tf.log, width = unit(10, "cm"),
                     km = 12, 
                     cluster_columns = F, 
                     show_row_names = F, 
                     row_title_rot = 0, 
                     row_gap = unit(3, "mm"), 
                     name = "Log2FC", 
                     column_title = "Resistant - Susceptible",
                     column_title_gp = gpar(fontfamily = "sans", fontsize = 28), 
                     column_names_gp = gpar(fontfamily = "sans", fontsize = 20),
                     col = heat.col,
                     right_annotation = ha)
print(heat.gen.k)

I would like to pull the clusters out for downstream analysis but I cannot do so with the following code:

clusterlist = row_order(heat.gen.k)
clu_df <- lapply(names(clusterlist), function(i){
  out <- data.frame(GeneID = rownames(tf.log[clusterlist[[i]],]),
                                             Cluster = paste0("cluster", i),
                                             stringsAsFactors = FALSE)
     return(out)
   }) %>%  
     do.call(rbind, .)

I keep getting the following error:

Error in data.frame(GeneID = rownames(tf.log[clusterlist[[i]], ]), Cluster = paste0("cluster",  :
  arguments imply differing number of rows: 0, 1

Not sure how to resolve this issue.

R complexheatmap • 6.4k views
ADD COMMENT
2
Entering edit mode
3.6 years ago
ATpoint 82k

It happens when a cluster consists of only one gene. In that case you are extracting a data.frame with a single row and this is then coerced into a vector which has no rownames. You can easily avoid that by changing:

rownames(tf.log[clusterlist[[i]],]) into rownames(tf.log)[clusterlist[[i]]] so you don't extract the data.frame and then the rownames but extract the rownames first and then subset it to those genes you have per cluster.

ADD COMMENT
0
Entering edit mode

This worked. Thanks!

ADD REPLY
0
Entering edit mode

Hi! I have exactly the same problem but your solution did not solve it! So this is my original dataset (originally genes in rows and samples in column, total number of genes 516):

 tibble: 10 x 15
   ID      Description      Associated.Gene~ `2-3dpg` `8dpg` `14dpg`
   <chr>   <chr>            <chr>               <dbl>  <dbl>   <dbl>
 1 ENSG00~ glutamate-cyste~ GCLC                26.7    44.6    29.7
 2 ENSG00~ LIM and SH3 pro~ LASP1               59.9    32.4    54.2
 3 ENSG00~ RNA binding mot~ RBM6                44.1    64.6    53.5
 4 ENSG00~ tumor necrosis ~ TNFRSF12A          634.    346.    362. 
 5 ENSG00~ vacuolar protei~ VPS41               53.5    64.8    43.3

so I tidy up the dataset as follow:

HuSE <- andyf[,-(11:15)]
HuSE <- HuSE[,-(1:2)] #remove categorical varialble
HuSE2 <- na.omit(HuSE) #remove na
rnames <- HuSE2$Associated.Gene.Name
data <- HuSE2[,-(1)] #remove categorical varialble
data2 <- as.matrix(data) 
rownames(data2) <- rnames
data2 <- t(data) # transpose
mydata <- scale(data2)#scale
mydata2 <- t(mydata)

and then make the heatmap (worked fine, divide cluster and so on ok):

HM <- Heatmap(mydata2, km=6, border_gp = gpar(col = "black"))  #Make a heatmap, and have 3 clusters
HM <- draw(HM)  #Show the heatmap
r.dend <- row_dend(HM)  #Extract row dendrogram
rcl.list <- row_order(HM)  #Extract clusters (output is a list)
lapply(rcl.list, function(x) length(x))  #check/confirm size cluste

the cluster are not made by one gene only so I should be able to retrieve the genes presents in the clusters:

$`2`
[1] 99

$`1`
[1] 40

$`4`
[1] 108

$`3`
[1] 44

$`5`
[1] 82

$`6`
[1] 143

but when I do the same code as reported :

clusterlist = row_order(HM)
clu_df <- lapply(names(clusterlist), function(i){
  out <- data.frame(GeneID = rownames(mydata2)[clusterlist[[i]]],
                    Cluster = paste0("cluster", i),
                    stringsAsFactors = FALSE)
  return(out)
}) %>%  
  do.call(rbind, .)

I still get the same error:

Error in data.frame(GeneID = rownames(mydata2)[clusterlist[[i]]], Cluster = paste0("cluster",  : 
  arguments imply differing number of rows: 0, 1
Called from: data.frame(GeneID = rownames(mydata2)[clusterlist[[i]]], Cluster = paste0("cluster", 
    i), stringsAsFactors = FALSE)

What am I doing wrong??

Thank you!

Camilla

ADD REPLY
0
Entering edit mode

Isn't clusterlist just an ordinary list without names? I would probably do 1:length(clusterlist) rather than names(clusterlist).

ADD REPLY
0
Entering edit mode

thank for you reply! I did change it but same error:

clusterlist = row_order(HM)
    clu_df <- lapply(1:length(clusterlist), function(i){
      out <- data.frame(GeneID = rownames(mydata2)[clusterlist[[i]]],
                        Cluster = paste0("cluster", i),
                        stringsAsFactors = FALSE)
      return(out)
    }) %>%  
      do.call(rbind, .)
ADD REPLY
0
Entering edit mode

Can you provide some reproducible data? (dput)

ADD REPLY
0
Entering edit mode

what do you mean? I have a data frame with 7 samples in column and 516 genes in row. when I prepare the data for the heatmap, I remove the categorical variable, and convert into a matrix:

HuSE <- andyf[,-(11:15)]
HuSE <- HuSE[,-(1:2)] #remove categorical varialble
HuSE2 <- na.omit(HuSE) #remove na
rnames <- HuSE2$Associated.Gene.Name
data <- HuSE2[,-(1)] #remove categorical varialble
data2 <- as.matrix(data) 
rownames(data2) <- rnames
mydata <- scale(t(data2))#scale andn traspose

My dataset is like this:

mat = matrix(rnorm(80, 2), 8, 10)
mat = rbind(mat, matrix(rnorm(40, -2), 4, 10))
rownames(mat) = letters[1:12]
colnames(mat) = letters[1:10]

And indeed the class of mydata and mat is the same: [1] "matrix" "array"

and if I try to re-run the same code posted here issue with the same matrix (basically copu and paste) I got multiple errors which seems to be related to :

out <- data.frame(GeneID = rownames(mat[rcl.list[[i]],]),
ADD REPLY

Login before adding your answer.

Traffic: 3431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6