Question

Calculating z-scores using edgeR's fitted values (output of glmQLFit)

1

Entering edit mode

6.0 years ago

n.anuragsharma ▴ 40

From what I can find in papers, heatmaps using RNA seq data are created in several ways: using log-fold changes, z-scores, etc.

The edgeR vignette states:

Inputing RNA-seq counts to clustering or heatmap routines designed for microarray data is not straight-forward, and the best way to do this is still a matter of research. To draw a heatmap of individual RNA-seq samples, we suggest using moderated log-counts-per-million. This can be calculated by cpm with positive values for prior.count, for example :

> logcpm <- cpm(y, log=TRUE)

Just out of curiosity, I was wondering, how would it differ from calculating z-scores using the fitted.values (derived from the glmQLFit step) in the RNA seq analysis pipeline. Would the heat maps created using z-scores calculated from fitted.values turn out all that different?

R RNA-Seq edgeR • 2.8k views

ADD COMMENT • link updated 6.0 years ago by Gordon Smyth ★ 8.6k • written 6.0 years ago by n.anuragsharma ▴ 40

score 5 · Accepted Answer · 2019-11-24

5

Entering edit mode

6.0 years ago

Gordon Smyth ★ 8.6k

The purpose of making heatmap of logCPMs is to display sample to sample variability, which allows you to see variability both between groups and between replicates.

Plotting fitted values instead would be pointless because fitted values do not show variability between replicates, and also incorrect because fitted values are not normalized by library size.

ADD COMMENT • link 6.0 years ago by Gordon Smyth ★ 8.6k

0

Entering edit mode

Much appreciated, thank you.

ADD REPLY • link 6.0 years ago by n.anuragsharma ▴ 40

0

Entering edit mode

As a continuation, would it be erroneous to average the log2 CPM values of replicates (after I have ascertained that there is indeed greater difference between samples rather than replicates)?

ADD REPLY • link 6.0 years ago by n.anuragsharma ▴ 40

1

Entering edit mode

If you want to plot group values rather than individual sample values, then use cpmByGroup. There is never a need to average logCPM values yourself.

ADD REPLY • link 6.0 years ago by Gordon Smyth ★ 8.6k