Question: Improve Large Heatmap Generation in R
gravatar for Thorerges
2.9 years ago by
Thorerges0 wrote:

I have made an unimpressively large heatmap in R. The rows correspond to RPKM expression values from GTEx, columns are their tissues.

> heatmap(matrix, Rowv=NA, Colv=NA, col = heat.colors(256), cexRow = 0.5, scale="column")
> dim(matrix) #dimension of my matrix
[1] 18101    53

Does anyone have any ideas as to how to make this more interpretable/cleaner? Or is this kind of data too large for a heatmap? The matrix is

enter image description here

rna-seq coding R • 1.3k views
ADD COMMENTlink modified 2.9 years ago by Whoknows770 • written 2.9 years ago by Thorerges0
  • If possible use log transformed values of matrix to generate the heatmap. Also try different clustering methods (if it is appropriate to your data).
  • Include breaks and break custom colors. Example: Chek this post
  • If there are extreme outliers in your data, mark those using different colors using breaks
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by EagleEye6.6k

Think about what you want to show. What's the message of the figure ? There's no point having a heatmap (or any other type of figure for that matter) that can't be legibly displayed on standard media (e.g. computer screen or paper). First, if the problem is that there are no patterns to be seen where you expect some, you could consider rescaling the values to try and increase the range that is mapped to colors. However, usually one doesn't want to show individual data points but instead make more general statements about groups so this immediately suggests clustering the data. Also there may be a better representation than the heatmap for what you want to show.

ADD REPLYlink written 2.9 years ago by Jean-Karim Heriche22k
gravatar for Whoknows
2.9 years ago by
Whoknows770 wrote:

Dear sherif,

The way you use for importing genes information for heatmap is really important. We always use heatmap for differentially expressed genes with +2 /-2 fold-change or +1/-1 in log2 transformation.

So you should limit genes to those which changed significantly, as I see in your heatmap image the values for all genes are same it could be solved by calculating log2 for all values, some code like:

matrix <- log2(matrix +1)

the above code correct most high values and your could see difference in log2 scale, I have added 1 to avoid computing log2 on zero value.

You also can try pheatmap package it consist several methods for clustering which might be useful for your data.

ADD COMMENTlink written 2.9 years ago by Whoknows770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour