10 months ago by
In what follows, I'm assuming that you want to have genes in rows, and samples in columns. The answer might be different if this is not the case, hopefully for reasons that will make sense after the following....
Both TMM and TPM proceedures include a step to normalise for the difference between samples in an attempt to make the measurement for a given gene comparable between samples. However, TMM does a better job of this.
TPM also includes steps that attempt to normalize expression values such that they are comparable between two different genes WITHIN one sample (e.g. is Gene A or Gene B more highly expressed).
Normally the recommendation, if you have to choose between counts and TPM is to choose TPM (or TPM caluculated from TMM-normalised counts). But if you plan to do row normalisation, then this will undo the TPM transformation anyway.
However, as hinted at in your final question, there is another transformation that needs to be considered: variance stabilisation. Log2 is often used as variance stabilising transform in many fields, but because we deal with a lot of zeros, it is often not suitable. One solution is to add a pseudo-count - this both further stabilises the variance, and deals with the zeros problem, but the choice of + 1 is pretty arbitrary. Luckily, there are more sophisticated alternatives, the most common being regularized log and
vst both provided by
DESeq2. These transforms will also deal with normalising raw counts in a manner similar to the TMM normalization of edgeR.
A final alternative, if you wish to stay in the edgeR universe, is limma.voom which will take an edgeR object and apply transforms so that its variance is somewhat stabilised, but I know less about that.