TPM and TMM, which should I use for clustering?
2
0
Entering edit mode
2.6 years ago

Hello

I would like to generate a clustered heatmap for my datasets to get a general idea of the expression profile of my cohorts, but I have both TMM and TPM normalizations. Is there a preference to which of these normalizations are better for this?

tpm tmm clustering • 2.1k views
2
Entering edit mode
2.6 years ago
h.mon 33k

How did you calculate the TMM and TPM values? Were the TPM values calculated over raw counts or TMM-normalized counts? Did you read previous discussions on the issue (e.g. TMM or TPM normalized counts for visualization? or Data for drawing Heatmaps (RNA-seq) )?

Anyway, I don't use any of them. When using edgeR, I use CPM - which can be calculated with function cpm() - to plot heatmaps. When using DESeq2, I use the "regularized log" transformation - which can be calculated with function rlog(). These transformations are applied to the counts already normalized by edgeR / DESeq2, so the values are robust to RNA composition (e. g. genes with very high expression being different between samples / treatments).

1
Entering edit mode
2.6 years ago

If I am trying to adjust for another factor, I would typically visualize multiple heatmaps (with or without centering by cell line or batch, for example).

Your question is a little different: I might compare the effect of TMM normalization on QC plots (such as sample clustering) and differential gene lists. However, after picking a strategy for a set of "initial" results, I would probably visualize the TMM normalized expression (if that was applied). That said, I would expect several rounds of analysis and discussion to critically assess your results, where your downstream steps (like functional enrichment) should inform the upstream steps (like normalization). So, I think you should probably have a more than one strategy that you've visualized before feeling comfortable with submitting a paper for publication.