heatmap in R
0
0
Entering edit mode
20 months ago
Rob ▴ 120

Hi I have HTseq data and want to plot heatmap for significant expressed genes. I followed Kevin Blighe tutorial :https://github.com/kevinblighe/E-MTAB-6141 I still do not have any pattern. These genes are significantly differentially expressed between two groups (tretment and control) and should have pattern in heatmap. Any body can help me with this? Also, default color range for this code is -4 to 4. How can I change it to for example -3 to 3? this is my code:

require(RColorBrewer)
require(ComplexHeatmap)
require(circlize)
require(digest)
require(cluster)
library(DESeq2)

##obtain the data
data <- read.table("mydata.txt", sep = '\t',
header = TRUE, stringsAsFactors = FALSE,check.names=FALSE, row.names = 1)
mat<- as.matrix (data)
####################normalize and log2transform
mat1 <- vst(mat) ##normalization and logtransformation
mat2<- t(scale(t(mat1))) ##zscore of normalized log2 transformed data

sig_genes <- read.table("SigGenes_FDR0.05_FC2.txt", sep = '\t',
header = FALSE, stringsAsFactors = FALSE,check.names=FALSE)[,1]

##Check the md5 checksums to ensure data integrity / security. The checksums should be:

digest::digest(mat3, algo = 'md5')
digest::digest(sig_genes, algo = 'md5')

# --check that both objects are aligned by name
# Subset the expression matrix for the statistically significant genes
mat <- mat2[sig_genes, ]

#set colour scheme and choose breaks
myCol <- colorRampPalette(c('dodgerblue', 'black', 'yellow'))(100)
myBreaks <- seq(-3, 4, length.out = 100)
##create annotation: treatment status and sex
#First, we will just generate some colour mappings for the metadata sex
# Sex
cd3 <- metadata$Sex cd3 <- cd3[!is.na(cd3)] # remove missing values - we don't want to include these in the mapping pick.col <- brewer.pal(9, 'Greens') col.cd3 <- colorRampPalette(pick.col)(length(unique(cd3))) unique(col.cd3) ann <- data.frame( treatmentStatus = metadata$treatmentStatus,
Sex = metadata$Sex, stringsAsFactors = FALSE) # create the colour mapping # create the colour mapping colours <- list(treatmentstatus = c('0' = 'blue', '1' = 'red'), Sex = c('M' = "#F7FCF5", 'F' = '#C7E9C0')) # now create the ComplexHeatmap annotation object # as most of these parameters are self-explanatory, comments will only appear where needed colAnn <- HeatmapAnnotation( df = ann, which = 'col', # 'col' (samples) or 'row' (gene) annotation? na_col = 'white', # default colour for any NA values in the annotation data-frame, 'ann' col = colours, annotation_height = 0.6, annotation_width = unit(1, 'cm'), gap = unit(1, 'mm'), annotation_legend_param = list( treatmentStatus = list( nrow = 4, # number of rows across which the legend will be arranged title = 'treatmentStatus', title_position = 'topcenter', legend_direction = 'vertical', title_gp = gpar(fontsize = 12, fontface = 'bold'), labels_gp = gpar(fontsize = 12, fontface = 'bold')), Sex = list( nrow = 5, title = 'Sex', title_position = 'topcenter', legend_direction = 'vertical', title_gp = gpar(fontsize = 12, fontface = 'bold'), labels_gp = gpar(fontsize = 12, fontface = 'bold'))) ) ##create annotation: box-and-whisker plots boxplotCol <- HeatmapAnnotation( boxplot = anno_boxplot( mat, border = FALSE, gp = gpar(fill = "#CCCCCC"), pch = '.', size = unit(2, "mm"), axis = TRUE, axis_param = list( gp = gpar(fontsize = 12), side = 'left')), annotation_width = unit(c(2.0),"cm"), which = "col") boxplotRow <- HeatmapAnnotation( boxplot = row_anno_boxplot( mat, border = FALSE, gp = gpar(fill = '#CCCCCC'), pch = '.', size = unit(2, 'mm'), axis = TRUE, axis_param = list( gp = gpar(fontsize = 12), side = 'top')), annotation_width = unit(c(2.0), 'cm'), which = 'row') ##create annotation: gene labels #retain every 4th successive label. genelabels <- rowAnnotation( Genes = anno_mark( at = seq(1, nrow(mat), 4), labels = rownames(mat)[seq(1, nrow(mat), 4)], labels_gp = gpar(fontsize = 10, fontface = "bold"), padding = 0.75), width = unit(2.0, "cm") + max_text_width( rownames(mat)[seq(1, nrow(mat), 10)], gp = gpar(fontsize = 10, fontface = "bold"))) ##################### ##perform partitioning around medoids (PAM) to identify clusters in the data pamClusters <- cluster::pam(x, k = 2) # pre-select k = 2 centers pamClusters$clustering <- paste0('Cluster ', pamClusters$clustering) # fix order of the clusters to have 1 to 4, top to bottom pamClusters$clustering <- factor(pamClusters$clustering, levels = c('Cluster 1', 'Cluster 2')) ##create the actual heatmap object hmap <- Heatmap(mat, # split the genes / rows according to the PAM clusters #split = pamClusters$clustering,
cluster_row_slices = TRUE,
name = 'Gene\nZ-\nscore',
col = colorRamp2(myBreaks, myCol),
# parameters for the colour-bar that represents gradient of expression
heatmap_legend_param = list(
color_bar = 'continuous',
legend_direction = 'vertical',
legend_width = unit(8, 'cm'),
legend_height = unit(5.0, 'cm'),
title_position = 'topcenter',
title_gp=gpar(fontsize = 12, fontface = 'bold'),
labels_gp=gpar(fontsize = 12, fontface = 'bold')),

# row (gene) parameters
cluster_rows = TRUE,
show_row_dend = TRUE,
#row_title = 'Statistically significant genes',
row_title_side = 'left',
row_title_gp = gpar(fontsize = 12,  fontface = 'bold'),
row_title_rot = 90,
show_row_names = FALSE,
row_names_gp = gpar(fontsize = 10, fontface = 'bold'),
row_names_side = 'left',
row_dend_width = unit(25,'mm'),

# column (sample) parameters
cluster_columns = FALSE,
show_column_dend = FALSE,
column_title = '',
column_title_side = 'bottom',
column_title_gp = gpar(fontsize = 12, fontface = 'bold'),
column_title_rot = 0,
show_column_names = FALSE,
column_names_gp = gpar(fontsize = 10, fontface = 'bold'),
column_names_max_height = unit(10, 'cm'),
column_dend_height = unit(25,'mm'),

# cluster methods for rows and columns
clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
clustering_method_columns = 'ward.D2',
clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
clustering_method_rows = 'ward.D2',

# specify top and bottom annotations
top_annotation = colAnn)
#bottom_annotation = boxplotCol)

# draw the heatmap
draw(hmap + genelabels,
heatmap_legend_side = 'left',
annotation_legend_side = 'right',
row_sub_title_side = "left")

# extra: change colour scheme, breaks, and do extra clustering on columns
myCol <- colorRampPalette(c('royalblue', 'white', 'red3'))(100)
##myBreaks <- seq(-1.5, 1.5, length.out = 100)

hmap1 <- Heatmap(mat,

name = 'Gene Z-score',

col = colorRamp2(myBreaks,myCol),

heatmap_legend_param = list(
color_bar = 'continuous',
legend_direction = 'horizontal',
legend_width = unit(8, 'cm'),
legend_height = unit(5.0, 'cm'),
title_position = 'topcenter',
title_gp=gpar(fontsize = 30, fontface = 'bold'),
labels_gp=gpar(fontsize = 24, fontface = 'bold')),

cluster_rows = TRUE,
show_row_dend = TRUE,
row_title = 'Statistically significant genes',
row_title_side = 'left',
row_title_gp = gpar(fontsize = 30,  fontface = 'bold'),
row_title_rot = 90,
show_row_names = FALSE,
row_names_gp = gpar(fontsize = 11, fontface = 'bold'),
row_names_side = 'left',
row_dend_width = unit(25,'mm'),

cluster_columns = TRUE,
show_column_dend = FALSE,
column_title = 'Samples',
column_title_side = 'bottom',
column_title_gp = gpar(fontsize = 30, fontface = 'bold'),
column_title_rot = 0,
show_column_names = FALSE,
column_names_gp = gpar(fontsize = 8, fontface = 'bold'),
column_names_max_height = unit(10, 'cm'),
column_dend_height = unit(25,'mm'),

clustering_distance_columns = function(x) as.dist(1 - cor(t(x))),
clustering_method_columns = 'ward.D2',
clustering_distance_rows = function(x) as.dist(1 - cor(t(x))),
clustering_method_rows = 'ward.D2')

pushViewport(viewport(layout = grid.layout(nr = 1, nc = 2)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
draw(hmap1,
heatmap_legend_side = 'top',
row_sub_title_side = 'left',
newpage = FALSE)
popViewport()

pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
popViewport()

rna-seq • 1.3k views
0
Entering edit mode

Hi @rhasanvandj, Is Ok for you to share the heatmap you have created?

0
Entering edit mode

yes, sure Here is the heatmap

0
Entering edit mode

Yes, there is no pattern in your heatmap please reassure about your upstream analysis steps especially DE analysis. Also when your groups in comparison are significantly imbalanced in terms of sample size, this also would cause to have no distinctive pattern in the heatmap.

0
Entering edit mode

Thanks Hamid I repeated DE a few times and the results are fine. What other analysis I need to do before plotting the heatmap? (you mentioned as downstream analysis steps)

0
Entering edit mode

oops, I meant upstream steps like DE and data transformation. Edited the comments. Did you repeat DE with same pipeline or tried with different pipelines? What about your sample size in each group? It would be also helpful if you label the heatmap rows and columns. Also add dendrogram to the heatmap column.

0
Entering edit mode

I tried with different pipelines: RSEM data, count data, and this one is with HT-seq raw count data. I used vst() from DEeq2 to normalize and log2 transform HT-Seq raw count data, then I changed them to z-score (scale by row). I plotted heatmaps with raw data, normalized log2transformed data, z-scores. I did not get pattern. then, I winscaled the z-scores (change the values smaller than -3 to -3 and changed the values larger than 3 to 3) to avoid effects of very highly over/under expressed data overcoming the pattern. The image I shared is from winscaled data and the best one. I have different analysis: one is with 44 samples (22 control: 22 treatment), 132 samples(66 vs 66), 88 samples (44 vs 44). I dont get good pattern for any of the analyses.

I did not add dendrogram to the column because I do not want to cluster my column. I want them based on the order I sorted in my file.

1
Entering edit mode

I stopped to comment here because your workflow is not clear to me. Indeed I failed to understand what you did.

I plotted heatmaps with raw data

At least for this part of the workflow, you can find several recommendations against doing so.

Maybe other can provide some hints for you.

0
Entering edit mode

yes you are right. It was just a trial. One thing that I want to do is clustering inside each of two groups. How can I do clustering for each group separately?

1
Entering edit mode

Came across this: https://www.nature.com/articles/nmeth.1902. You and others may find this helpful.

Hundreds of rows and columns can be displayed on a screen. Heat maps rely fundamentally on color encoding and on meaningful reordering of the rows and columns. When either of these components is compromised, the utility of the visualization suffers.

0
Entering edit mode

Thanks Hamid for sharing the paper