In what follows, I'm assuming that you want to have genes in rows, and samples in columns. The answer might be different if this is not the case, hopefully for reasons that will make sense after the following....
Both TMM and TPM proceedures include a step to normalise for the difference between samples in an attempt to make the measurement for a given gene comparable between samples. However, TMM does a better job of this.
TPM also includes steps that attempt to normalize expression values such that they are comparable between two different genes WITHIN one sample (e.g. is Gene A or Gene B more highly expressed).
Normally the recommendation, if you have to choose between counts and TPM is to choose TPM (or TPM caluculated from TMM-normalised counts). But if you plan to do row normalisation, then this will undo the TPM transformation anyway.
However, as hinted at in your final question, there is another transformation that needs to be considered: variance stabilisation. Log2 is often used as variance stabilising transform in many fields, but because we deal with a lot of zeros, it is often not suitable. One solution is to add a pseudo-count - this both further stabilises the variance, and deals with the zeros problem, but the choice of + 1 is pretty arbitrary. Luckily, there are more sophisticated alternatives, the most common being regularized log and
vst both provided by
DESeq2. These transforms will also deal with normalising raw counts in a manner similar to the TMM normalization of edgeR.
A final alternative, if you wish to stay in the edgeR universe, is limma.voom which will take an edgeR object and apply transforms so that its variance is somewhat stabilised, but I know less about that.
Thank you very much for your comprehensive answer!
So as I understand, graphically in expression matrix, purpose of TPM is for same-column comparison and TMM is for same-row comparison. I think scaling by row will only benefit those who only interested in clusters of highly/lowly expressed genes in relative meaning (high/low compared to the same gene in other samples). Clustering in non-scaled-row matrix may give more informative clusters, I suppose.
Depends on your distance matrix. Euclidean distance on a row-scale matrix is roughly equivalent to pearson distance on a none-scaled matrix.
How about Deseq2 based normalized count and 2 different scaling methods. normalized_counts <- counts(dds, normalized=TRUE).
Following is the actual normalized count, scaling through Pheatmap seems not good. I compared CPM(normalized_counts) vs normalized_counts with the combination of with/without log2 and pheatmap based row scaling.
log2 on normalized count (no CPM)
normalized count with pheatmap row scaling
cpm on normalized count without scaling