From what I can find in papers, heatmaps using RNA seq data are created in several ways: using log-fold changes, z-scores, etc.
The edgeR vignette states:
Inputing RNA-seq counts to clustering or heatmap routines designed for microarray data is not straight-forward, and the best way to do this is still a matter of research. To draw a heatmap of individual RNA-seq samples, we suggest using moderated log-counts-per-million. This can be calculated by cpm with positive values for prior.count, for example :
> logcpm <- cpm(y, log=TRUE)
Just out of curiosity, I was wondering, how would it differ from calculating z-scores using the fitted.values (derived from the glmQLFit step) in the RNA seq analysis pipeline. Would the heat maps created using z-scores calculated from fitted.values turn out all that different?
Much appreciated, thank you.
As a continuation, would it be erroneous to average the log2 CPM values of replicates (after I have ascertained that there is indeed greater difference between samples rather than replicates)?
If you want to plot group values rather than individual sample values, then use
cpmByGroup. There is never a need to average logCPM values yourself.