TPM and TMM, which should I use for clustering?
2.6 years ago

Hello

I would like to generate a clustered heatmap for my datasets to get a general idea of the expression profile of my cohorts, but I have both TMM and TPM normalizations. Is there a preference to which of these normalizations are better for this?

2.6 years ago
h.mon

How did you calculate the TMM and TPM values? Were the TPM values calculated over raw counts or TMM-normalized counts? Did you read previous discussions on the issue (e.g. TMM or TPM normalized counts for visualization? or Data for drawing Heatmaps (RNA-seq) )?

Anyway, I don't use any of them. When using edgeR, I use CPM - which can be calculated with function cpm() - to plot heatmaps. When using DESeq2, I use the "regularized log" transformation - which can be calculated with function rlog(). These transformations are applied to the counts already normalized by edgeR / DESeq2, so the values are robust to RNA composition (e. g. genes with very high expression being different between samples / treatments).

2.6 years ago

If I am trying to adjust for another factor, I would typically visualize multiple heatmaps (with or without centering by cell line or batch, for example).

Your question is a little different: I might compare the effect of TMM normalization on QC plots (such as sample clustering) and differential gene lists. However, after picking a strategy for a set of "initial" results, I would probably visualize the TMM normalized expression (if that was applied). That said, I would expect several rounds of analysis and discussion to critically assess your results, where your downstream steps (like functional enrichment) should inform the upstream steps (like normalization). So, I think you should probably have a more than one strategy that you've visualized before feeling comfortable with submitting a paper for publication.