Question: How to cluster genes in heatmap
1
gravatar for Mehmet
2.4 years ago by
Mehmet510
Japan
Mehmet510 wrote:

Dear All,

I have a data matrix that has 17 samples and over 800 genes that belong to ten different gene families. I want to show these gene families in heatmap by marking them.

I performed heatmap but I do not know how to show gene families in the heatmap graph.

Anyone knows how to that?

rna-seq R gene • 6.3k views
ADD COMMENTlink modified 2.4 years ago by RamRS26k • written 2.4 years ago by Mehmet510
2

try annotation_rows in pheatmap.

ADD REPLYlink written 2.4 years ago by cpad011212k
1

a very good example of a clustered heatmap

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Buffo1.8k
5
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

Yes, as per Sean, in ComplexHeatmap, you can segregate your heatmap into 'blocks' of different genes using the 'split' parameter. You can end up with nice heatmaps like this: A: How to plot a heatmap with two different distance matrices for X and Y

Edit April 16, 2019: skip to the working example, here: C: how to cluster genes in heatmap

ADD COMMENTlink modified 11 months ago • written 2.4 years ago by Kevin Blighe56k

Hi Kevin,

Here is my data:

EffectorName    GeneID  Sample1      Sample2     Sample3    Sample 4 .... Sample 17
GH45            Gene1   25.7847      19.710      22.6148    ....          .......
Expansin        Gene2   29.2436      29.2168     963.745    ......        .......

.......................................................................................................................................................

What I want to do is to show EffectorNames in the heatmap.

ADD REPLYlink modified 2.4 years ago by Kevin Blighe56k • written 2.4 years ago by Mehmet510

Sorry, it's not clear what you want to do...

If I have this data:

Family      Gene  Sam1 Sam2 Sam3 Sam4
ncRNA       Gene1 10   11   6    1
ncRNA       Gene2 7    6    7    33
pseudogene  Gene3 6    65   3    3
ncRNA       Gene4 10   11   6    1
ncRNA       Gene5 7    6    7    33
pseudogene  Gene6 6    65   3    3

For ComplexHeatmap, if I want to break the heatmap by gene family, I would supply the 'Family' column to the split parameter of the Heatmap function in ComplexHeatmap. This would then break up the heatmap and perform clustering independently for genes under ncRNA and pseudogenes.

ADD REPLYlink written 2.4 years ago by Kevin Blighe56k
3

R code:

test=read.csv("file.txt", sep="\t", header=T)
rownames(test)=test[,2]
chars=test[,c(1,2)]
test1=test[,c(3:6)]
pheatmap(as.matrix(test1), scale = "row", clustering_distance_rows = "correlation", clustering_method = "complete",color =rainbow(2), main="Significant genes", fontsize_col=24, fontsize_row = 24,annotation_row = chars[1])

input:

$ cat file.txt 
Family  Gene    Sam1    Sam2    Sam3    Sam4
ncRNA   Gene1   10  11  6   1
ncRNA   Gene2   7   6   7   33
pseudogene  Gene3   6   65  3   3
ncRNA   Gene4   10  11  6   1
ncRNA   Gene5   7   6   7   33
pseudogene  Gene6   6   65  3   3

Rplot

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011212k

Hi ,

I am trying to run, but I got errors:

Error in seq.default(-m, m, length.out = n + 1) : 
  'from' must be a finite number
In addition: Warning messages:
1: In min(x, na.rm = T) : no non-missing arguments to min; returning Inf
2: In max(x, na.rm = T) : no non-missing arguments to max; returning -Inf
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Mehmet510
3

Hi! For ComplexHeatmap, try this code (note the split parameter):

require(ComplexHeatmap)
require(circlize)
require(cluster)

df <- read.table("test", header=TRUE)
df
      Family  Gene Sam1 Sam2 Sam3 Sam4
1      ncRNA Gene1   10   11    6    1
2      ncRNA Gene2    7    6    7   33
3 pseudogene Gene3    6   65    3    3
4      ncRNA Gene4   10   11    6    1
5      ncRNA Gene5    7    6    7   33
6 pseudogene Gene6    6   65    3    3
7 pseudogene Gene7    5   45    2    1

heat <- t(scale(t(df[,3:ncol(df)])))

hmap <- Heatmap(heat,
        name="Transcript Z-score",
        #col=colorRamp2(myBreaks, myCol),
        heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),
        split=df$Family,
        row_title="Transcript class",
        row_title_side="left",
        row_title_gp=gpar(fontsize=15, fontface="bold"),
        show_row_names=TRUE,
        column_title="",
        column_title_side="top",
        column_title_gp=gpar(fontsize=15, fontface="bold"),
        column_title_rot=0,
        show_column_names=TRUE,
        clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
        clustering_method_columns="ward.D2",
        clustering_distance_rows="euclidean",
        clustering_method_rows="ward.D2",
        row_dend_width=unit(30,"mm"),
        column_dend_height=unit(30,"mm"))

draw(hmap, heatmap_legend_side="left")

Captura_de_tela_de_2017_11_28_10_48_38

ADD REPLYlink written 2.4 years ago by Kevin Blighe56k

Hi Kevin,

Thank you. I followed but I received this error;

Error in colorRamp2(myBreaks, myCol) : 
  Length of `breaks` should be equal to `colors`.
ADD REPLYlink written 2.4 years ago by Mehmet510
1

Yes, I commented out that part of the code. If you would like to use it, then execute the following prior to generating the heatmap:

#myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-3, 3, length.out=100)

You can choose any colours that you want here.

Also note that the t( scale( t( x ) ) ) function is scaling the data to Z-scores.

ADD REPLYlink written 2.4 years ago by Kevin Blighe56k

Hi Kevin,

Thank you very much for your help. I was able to generate a heatmap as I wanted.

ADD REPLYlink written 2.4 years ago by Mehmet510
2

Great. You should devote a full working day to looking over ComplexHeatmap. Once you learn it, you will never then go back to heamap.2 or pheatmap.

ADD REPLYlink written 2.4 years ago by Kevin Blighe56k
1

Yes I will. Your help to this post is a tutorial for other people, so anyone can follow these steps easily to make a complex heatmap.

ADD REPLYlink written 2.4 years ago by Mehmet510

One thing I want to ask; how to change position of Family names in the heatmap? They are positioned as vertical, but I want to show them as vertical. because it is not possible to see as some of them are overlapped.

ADD REPLYlink written 2.4 years ago by Mehmet510
2

Take a look at this (below). I now add all sorts of annotations for you, just to give you an idea. Also note the following:

  • I set the orientation/rotation of the family names with row_title_rot=0 (note that when you use the split parameter, it overrides the row title)
  • I now set gene names as rownames for heat, with rownames(heat) <- df$Gene
  • I use different distance metrics for rows and columns, with clustering_distance_columns and clustering_distance_rows

This is a 'simple' ComplexHeatmap though, if that makes sense. There is much more to ComplexHeatmap, and you don't even want to see the complexity of one of the recent ones that I made. The code for it runs into the hundreds of lines. I commend the author of the package, who did a really great job.


require(ComplexHeatmap)
require(circlize)
require(cluster)

df <- read.table("test", header=TRUE)
df
      Family  Gene Sam1 Sam2 Sam3 Sam4
1      ncRNA Gene1   10   11    6    1
2      ncRNA Gene2    7    6    7   33
3 pseudogene Gene3    6   65    3    3
4      ncRNA Gene4   10   11    6    1
5      ncRNA Gene5    7    6    7   33
6 pseudogene Gene6    6   65    3    3
7 pseudogene Gene7    5   45    2    1

heat <- t(scale(t(df[,3:ncol(df)])))

rownames(heat) <- df$Gene

#Set annotation
  ColAnn <- data.frame(colnames(heat))
  colnames(ColAnn) <- c("Sample")
  ColAnn <- HeatmapAnnotation(df=ColAnn, which="col")

  RowAnn <- data.frame(df$Family)
  colnames(RowAnn) <- c("Gene family")
  colours <- list("Gene family"=c("ncRNA"="royalblue","pseudogene"="red3"))
  RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row")

  boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm"))

  boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), lim=NULL, pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm"))


hmap <- Heatmap(heat,
        name="Transcript Z-score",
        col=colorRamp2(myBreaks, myCol),
        heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),

       #Split heatmap rows by gene family
        split=df$Family,

        #Row annotation configurations
        cluster_rows=TRUE,
        show_row_dend=TRUE,
        #row_title="Transcript", #overridden by 'split' it seems
        row_title_side="left",
        row_title_gp=gpar(fontsize=15, fontface="bold"),
        show_row_names=TRUE,
        row_names_side="left",
        row_title_rot=0,

        #Column annotation configuratiions
        cluster_columns=TRUE,
        show_column_dend=TRUE,
        column_title="Samples",
        column_title_side="top",
        column_title_gp=gpar(fontsize=15, fontface="bold"),
        column_title_rot=0,
        show_column_names=TRUE,

        #Dendrogram configurations: columns
        clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
        clustering_method_columns="ward.D2",
        column_dend_height=unit(30,"mm"),

        #Dendrogram configurations: rows
        clustering_distance_rows="euclidean",
        clustering_method_rows="ward.D2",
        row_dend_width=unit(30,"mm"),

        #Annotations (row annotation must be added with 'draw' function, below)
        top_annotation_height=unit(0.5,"cm"),
        top_annotation=ColAnn,

        bottom_annotation_height=unit(3, "cm"),
        bottom_annotation=boxAnnCol)

draw(hmap + RowAnn + boxAnnRow, heatmap_legend_side="left", annotation_legend_side="right")

Captura_de_tela_de_2017_11_28_14_30_00

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Kevin Blighe56k

Hi kevin Im using your code..have a look I m getting some error

require(ComplexHeatmap)
require(circlize)
require(cluster)
df <- read.csv('PATHWAY_gene.txt', header=TRUE,sep = "\t")
df
dim(df)
names(df)
heat <- t(scale(t(df[,3:ncol(df)])))

#################################################



##############################################


rownames(heat) <- df$Gene

myCol <- colorRampPalette(c("navyblue", "white", "red"))(100)
myBreaks <- seq(-2,2, length.out=100)
#Set annotation
ColAnn <- data.frame(colnames(heat))
colnames(ColAnn) <- c("Sample")
ColAnn <- HeatmapAnnotation(df=ColAnn, which="col")

RowAnn <- data.frame(df$Family)
colnames(RowAnn) <- c("Gene family")
colours <- list("Gene family"=
                  c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
                   "Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))
RowAnn <- HeatmapAnnotation(df=RowAnn, col=colours, which="row")

boxAnnCol <- HeatmapAnnotation(boxplot=anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"), pch=".", size=unit(2, "mm"), axis=FALSE, axis_side=NULL, axis_gp=gpar(fontsize=12)), annotation_width=unit(c(1, 7.5), "cm"))

boxAnnRow <- rowAnnotation(boxplot=row_anno_boxplot(heat, border=FALSE, gp=gpar(fill="#CCCCCC"),pch=".", size=unit(3, "cm"), axis=FALSE, axis_side="top", axis_gp=gpar(fontsize=12)), annotation_width=unit(c(3), "cm"))


hmap <- Heatmap(heat,
                name="Transcript Z-score",
                col=colorRamp2(myBreaks, myCol),
                heatmap_legend_param=list(color_bar="continuous", legend_direction="horizontal", legend_width=unit(5,"cm"), title_position="topcenter", title_gp=gpar(fontsize=15, fontface="bold")),

                #Split heatmap rows by gene family
                split=df$Family,

                #Row annotation configurations
                cluster_rows=FALSE,
                show_row_dend=FALSE,
                #row_title="Transcript", #overridden by 'split' it seems
                row_title_side="left",
                row_title_gp=gpar(fontsize=30, fontface="bold"),
                show_row_names=TRUE,
                row_names_side="left",
                row_title_rot=0,

                #Column annotation configuratiions
                cluster_columns=TRUE,
                show_column_dend=TRUE,
                column_title="Samples",
                column_title_side="top",
                column_title_gp=gpar(fontsize=15, fontface="bold"),
                column_title_rot=0,
                show_column_names=TRUE,

                #Dendrogram configurations: columns
                #clustering_distance_columns=function(x) as.dist(1-cor(t(x))),
                clustering_method_columns="complete",
                column_dend_height=unit(10,"mm"),

                #Dendrogram configurations: rows
                clustering_distance_rows="euclidean",
                clustering_method_rows="ward.D2",
                row_dend_width=unit(30,"mm"))

                #Annotations (row annotation must be added with 'draw' function, below)
                #top_annotation_height=unit(0.5,"cm"),
                #top_annotation=ColAnn)

                #bottom_annotation_height=unit(3, "cm"),
                #bottom_annotation=boxAnnCol)

draw(hmap + RowAnn , heatmap_legend_side="left", annotation_legend_side="right")






 Error in .local(object, ...) : 
      Gene family: cannot map colors to some of the levels:
    Activation of IRF by Cytosolic Pattern Recognition Receptor

Error when drawing annotation 'Gene family'
Error in .local(object, ...) : Error in .local(object, ...) : 
  Gene family: cannot map colors to some of the levels:
Activation of IRF by Cytosolic Pattern Recognition Receptors

Im not sure what is going wrong with the color defined

ADD REPLYlink written 2.2 years ago by krushnach80690
1
colours <- list("Gene family"=
                  c("Interferon Signaling"="red","Communication between Innate and Adaptive Immune Cells"="red1","Atherosclerosis Signaling "="red2",
                   "Activation of IRF by Cytosolic Pattern Recognition Receptors
"="azure","Neuroinflammation Signaling Pathway
"="royalblue","Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses
"="royalblue1","Role of Macrophages"="royalblue2","Death Receptor Signaling"="royalblue3","TREM1 Signaling
"="royalblue4","Toll-like Receptor Signaling
"="cyan1","NF-κB Signaling"="cyan2","HMGB1 Signaling
"="cyan3","PKCθ Signaling in T Lymphocytes"="cyan4","PPARα/RXRα Activation"="green4" ))

Hello friend, the problem is most likely in the line above. Can you double-check that all gene family names are correct, including upper- and lower-case

ADD REPLYlink written 2.2 years ago by Kevin Blighe56k
1

okay let me see that again

ADD REPLYlink written 2.2 years ago by krushnach80690

Im getting something like this

Since I think gene name was kind of cluttering it i removed but still its kind of messed up any suggestion

ADD REPLYlink written 2.2 years ago by krushnach80690

My next edcated guess is that the problem is with your hyphens. For example, "NF-κB Signaling" will have to be changed to "NF κB Signaling".

ADD REPLYlink written 2.2 years ago by Kevin Blighe56k

and how do you decide the sequence break may be i m doing something wrong because the number of family is around 15 and in my figure I can't see the z score bar , as well

ADD REPLYlink written 2.2 years ago by krushnach80690
1

It is trial and error. You can try myBreaks <- seq(-2,2, length.out=100) or myBreaks <- seq(-1,1, length.out=100), or something else. Your data does look strange (very flat).

Keep in mind that you do not necessarily have to scale the data and set break-points. Also remember that the colouring is purely for visualisation and does not change the actual clustering.

ADD REPLYlink written 2.2 years ago by Kevin Blighe56k

yes colouring to distinguish, why the z score map can't be seen is it because of the text which is taking all the space on the left side...

ADD REPLYlink written 2.2 years ago by krushnach80690

Hi Kevin,

I would like to ask you something. How can I add FPKM values of each gene in each sample into heatmap?

ADD REPLYlink written 2.1 years ago by Mehmet510

Hello again. Do you mean to literally add the numerical FPKM values to the heatmap?

ADD REPLYlink written 2.1 years ago by Kevin Blighe56k

Hi Kevin,

I figured out how to add FPKM values in cells of heatmap. But I also need to add a box plot of FPKM values in addition to z-score box plot. I tried but I could not see another legend option.

ADD REPLYlink written 2.1 years ago by Mehmet510
1

To do that, I think that you just create a HeatmapAnnotation and specify 2 boxplots in it, like this:

annotBoxplots <- HeatmapAnnotation(anno_boxplot(zscores, which = "row"), anno_boxplot(fpkm, which="row"), which="row", ...)
ADD REPLYlink written 2.1 years ago by Kevin Blighe56k

Hi Kevin,

I run the command below;

annotBoxplots <- HeatmapAnnotation(as.matrix(shorteffec.fpkm.txt), which="row")

I was wondering how to show only FPKM box plot (without annotation), not z-score boxplot.

ADD REPLYlink written 2.1 years ago by Mehmet510
1

Oh, maybe try this:

annotBoxplots < HeatmapAnnotation(anno_boxplot(as.matrix(shorteffec.fpkm.txt)), which="row")
ADD REPLYlink written 2.1 years ago by Kevin Blighe56k
1

Hi Kevin,

I was able to show FPKM box plot of rows (genes) and columns (samples/conditions) in the heatmap without annotation.

ADD REPLYlink written 2.1 years ago by Mehmet510

Hi Kevin,

I was able to produce annotated box plot based on FPKM values in the heatmap.

I mean this is the command that I added to Heatmap function and it put FPKM values in cell in the heatmap:

cell_fun = function(j, i, x, y, width, height, fill) {grid.text(sprintf("%.1f", shorteffec.fpkm.txt[i, j]), x, y, gp = gpar(fontsize = 15, col= "black"))}

What I need to do is to show box plot of FPKM values without any annotation. and not to use z-score legend in the heatmap.

ADD REPLYlink written 2.1 years ago by Mehmet510

Sorry Kevin, by mingling your code and complexheatmap option to keep genes in same order in two heat maps, I have this heat map. Now, how I can make the right heat map with more smooth coloring, I mean left heat map is darker and right one is higher.

library(ComplexHeatmap)
library(circlize)
mycol <- colorRamp2(c(-2,0,2), c("dodgerblue", "black", "yellow"))
> heat <- t(scale(t(norm_h0_t_r)))
> heat <- heat[apply(heat, MARGIN = 1, FUN = function(x) sd(x) != 0),]
> View(heat)
> t=heat[,1:2]
> r=heat[,3:4]
> dim(t)
[1] 8587    2
> dim(r)
[1] 8587    2
> Heatmap(t, col=mycol, cluster_columns = FALSE) + Heatmap(r, col=mycol, cluster_columns = FALSE)

May be same scale on both heat map

![enter image description here][1]

ADD REPLYlink modified 21 months ago • written 21 months ago by Za120

Please do not post your question in multiple places.

ADD REPLYlink written 21 months ago by RamRS26k

I have replied back in the other thread in order to maintain consistency: C: Why can't I reproduce the same heat map

ADD REPLYlink written 21 months ago by Kevin Blighe56k

How to deal with these?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Mehmet510

Hi Kevin,

I am trying to generate heatmap, but I am having difficulties. As you remember from my previous heatmap based on FPKM post, I need to use those data and I need to load data into R, scale, and heatmap.

Could you please send me an R code to do those steps?

ADD REPLYlink written 2.4 years ago by Mehmet510
2
gravatar for Sean Davis
2.4 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

Take a look at the heatmap.3 and ComplexHeatmap packages to mark your genes.

ADD COMMENTlink written 2.4 years ago by Sean Davis26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2110 users visited in the last hour