Published in the RNA Journal in 2020 - this paper argues that if the original RNA amount in the different samples is different, TPM should not be used to find differentially expressed genes.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7373998/
Seems like a lot of people like to use DESeq/EdgeR, which the paper asserts that fundamentally:
Most genes are not DE.
DE and non-DE genes behave similarly.
Balanced expression changes, that is, the number and magnitude of up- and down-regulated genes are comparable.
When comparing across samples of different RNA amounts, or, worse, across different strains of the same species - is the only way out DESeq/EdgeR? Is DESeq/EdgeR sufficiently robust for a use in such a case?
There is another subtlety here to do with correction for gene length in the RPKM and TPM measures. Log-counts-per-million performs quite reasonably for DE analyses if the library sizes are properly normalized as you indicate in your answer and if combined with a mean-variance trend (e.g., the limma-trend pipeline). But RPKM and TPM remain quite horrible for DE analyses even if the library sizes are normalized because the connection with read number (therefore with measurement error) is lost.