Forum:Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
Entering edit mode
24 months ago
ChocoParrot ▴ 20

Published in the RNA Journal in 2020 - this paper argues that if the original RNA amount in the different samples is different, TPM should not be used to find differentially expressed genes.

Seems like a lot of people like to use DESeq/EdgeR, which the paper asserts that fundamentally:

  1. Most genes are not DE.

  2. DE and non-DE genes behave similarly.

  3. Balanced expression changes, that is, the number and magnitude of up- and down-regulated genes are comparable.

When comparing across samples of different RNA amounts, or, worse, across different strains of the same species - is the only way out DESeq/EdgeR? Is DESeq/EdgeR sufficiently robust for a use in such a case?

fpkm paper tpm • 1.6k views
Entering edit mode
24 months ago
Gordon Smyth ★ 6.4k


Most genes are not DE.

That is an assumption used by most library size normalization methods including TMM. If the library sizes can be normalized by some other means then edgeR doesn't need that assumption.

DE and non-DE genes behave similarly.

Not sure what that means, but any statistical test of DE must make some sort of consistency assumptions.

Balanced expression changes, that is, the number and magnitude of up- and down-regulated genes are comparable.

edgeR makes no such assumption.

Entering edit mode
24 months ago
ATpoint 76k

The fact that the naive per-million metrics often fail to properly correct for library composition has been shown many times before. Here is an example using GTEx data that illustrates such a bias when using per-million normalization: TMM-Normalization

By the way towards Most genes are not DE, these normalization techniques in DESeq2/edgeR give a lot of freedom because you are not forced to use all genes for normalization. If you have unbalanced (or many DE) changes you are free to subset to a set of control genes that you think are for sure not DE and calculate the sample-specific size/normalization factors based on them. I do this often in applications like ChIP-seq or ATAC-seq, especially in the latter where pertubation samples (e.g. transcription factor knockouts where that TF notably regulates chromatin accessability) can give tens-of-thousands of differential regions out of a total set of > 100k regions. Careful inspection of MA-plots and then smart choice of regions for normalization is key here (inspection always makes sense, even in RNA-seq where usually the defaults run just fine).

So yes, per-million is often not adequate, even for visualization I find it suboptimal, e.g. when scaling browser tracks for the IGV, see here a solution on how to use the more elaborate metrics for something like bigwig normalization: ATAC-seq sample normalization

Entering edit mode

There is another subtlety here to do with correction for gene length in the RPKM and TPM measures. Log-counts-per-million performs quite reasonably for DE analyses if the library sizes are properly normalized as you indicate in your answer and if combined with a mean-variance trend (e.g., the limma-trend pipeline). But RPKM and TPM remain quite horrible for DE analyses even if the library sizes are normalized because the connection with read number (therefore with measurement error) is lost.


Login before adding your answer.

Traffic: 2204 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6