Question

Should I use data scaling for heatmaps?

2

Entering edit mode

8.6 years ago

NHEJ ▴ 360

I read this post and was still confused about how to use the scale feature of the heatmap command: Scale Data Before Drawing Heatmap Or Using Heatmap(..., Scale="Columan") In R?

For example, the following source states that scale=None is the default standard of use in the heatmap.2 command (see comments section of this link): https://biomickwatson.wordpress.com/2015/04/05/you-probably-dont-understand-heatmaps/

However, I am still confused when and why to use the scale option. Is using the default scale=None option the safer (more general) alternative when creating heatmaps? For example, would you only really care to scale by row if you have a time-course RNA-seq experiment?

I would be very grateful if someone could shine some light on using the scale option in the heatmap and heatmap.2 commands.

RNA-Seq heatmap • 10k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by NHEJ ▴ 360

Ram · Answer 1 · 2015-09-26

Dear NHEJ,

the implementation of scaling is highly dependent of your data and your current analysis-experimental design. Usually, if you perform the default heatmap.2 function without scaling and you see a general nice pattern of separation of your "expression object" with detectable "clusters" of selected color, there is no a great need to use it.

However, if you want to enhance and make more clear the difference of possible sub-clusters in your rows, then scaling of rows is the possible choice. Generally after scaling, all values become some kind of relative values based on a common score after transformation (i.e z-score), with the relative high and low scores, with negative related to down-regulation and positive to up-regulation. I'm not an expert on RNA-seq analysis, as I have mainly analyzed microarrays, but because you have tagged and mentioned RNA-seq, you might have examples of possible genes which have a high expression level, and some other have a relative very small expression(i think "counts"-reads is more appropriate).

Thus, in this case you should consider performing row scaling, but generally keep in mind that in many cases, you might lost information about "absolute expression level". Anyway, you should read the available workflow in Bioconductor (http://www.bioconductor.org/help/workflows/rnaseqGene/#eda).

Best,
Efstathios