Blog: Gene expression units explained: RPM, RPKM, FPKM and TPM
35
gravatar for Renesh
22 months ago by
Renesh1.6k
United States
Renesh1.6k wrote:

In RNA-seq gene expression data analysis, we come across various expression units such as RPM, RPKM, FPKM and raw reads counts. Most of the times it's difficult to understand basic underlying methodology to calculate these units from mapped sequence data.

I have seen a lot of post of such normalization questions and their confusion among readers. Hence, I attempted here to explain these units in the much simpler way (avoided complex mathematical expressions).

Why different normalized expression units:

The expression units provide a digital measure of the abundance of transcripts. Normalized expression units are necessary to remove technical biases in sequenced data such as depth of sequencing (more sequencing depth produces more read count for gene expressed at same level) and gene length (differences in gene length generate unequal reads count for genes expressed at the same level; longer the gene more the read count).

Gene expression units and calculation:

1. RPM (Reads per million mapped reads)

enter image description here

For example, You have sequenced one library with 5 million(M) reads. Among them, total 4 M matched to the genome sequence and 5000 reads matched to a given gene.

enter image description here

Notes:

  • RPM does not consider the transcript length normalization.
  • RPM Suitable for sequencing protocols where reads are generated irrespective of gene length![enter image description

2. RPKM (Reads per kilo base per million mapped reads)

enter image description here

Here, 10^3 normalizes for gene length and 10^6 for sequencing depth factor.

FPKM (Fragments per kilo base per million mapped reads) is analogous to RPKM and used especially in paired-end RNA-seq experiments. In paired-end RNA-seq experiments, two (left and right) reads are sequenced from same DNA fragment. When we map paired-end data, both reads or only one read with high quality from a fragment can map to reference sequence. To avoid confusion or multiple counting, the fragments to which both or single read mapped is counted and represented for FPKM calculation.

For example, You have sequenced one library with 5 M reads. Among them, total 4 M matched to the genome sequence and 5000 reads matched to a given gene with a length of 2000 bp.

enter image description here

Notes:

  • RPKM considers the gene length for normalization
  • RPKM is suitable for sequencing protocols where reads sequencing depends on gene length
  • Used in single-end RNA-seq experiments (FPKM for paired-end RNA-seq data)

3. TPM (Transcript per million)

Notes:

  • TPM considers the gene length for normalization
  • TPM proposed as an alternative to RPKM due to inaccuracy in RPKM measurement (Wagner et al., 2012)
  • TPM is suitable for sequencing protocols where reads sequencing depends on gene length

References:

  • Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008 Jul 1;5(7):621-8.
  • Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in biosciences. 2012 Dec 1;131(4):281-5.
rna-seq blog gene expression • 35k views
ADD COMMENTlink modified 11 months ago • written 22 months ago by Renesh1.6k
3

TPM does not count for total number of mapped reads

The M (per million) is for normalizing for sequencing depth.

The difference between FPKM/RPKM and TPM is the order of operations.

ADD REPLYlink written 22 months ago by igor8.0k

Order of operation is not the key point. But in TPM, we adjust 'transcripts' in TPM while we adjust 'reads' in FPKM.

ADD REPLYlink written 17 months ago by Shicheng Guo7.6k

I am not sure I understand. You can't adjust transcripts. They are already defined.

ADD REPLYlink written 17 months ago by igor8.0k

I read the paper carefully and definitely sure that the current post is wrong. You can read again, Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in biosciences. 2012 Dec 1;131(4):281-5.

ADD REPLYlink written 17 months ago by Shicheng Guo7.6k

Indeed.

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 9 months ago • written 11 months ago by Kevin Blighe45k
1

Hi Renesh, Please remove your post here, Your calculation method is totally not corrected fro TPM.

ADD REPLYlink written 17 months ago by Shicheng Guo7.6k

Can you please explain, where I am wrong in TPM calculation? If you read the paper, it clearly says, T is the total number of transcripts sampled by total sequenced reads. Please, let me know where I am wrong.

ADD REPLYlink written 17 months ago by Renesh1.6k

what about rpm/bp?reads per million mapped reads per base pair (rpm/bp) with background subtraction

ADD REPLYlink written 17 months ago by Ming Lu0

It should be stated up front that neither of these methods is optimal for conducting differential expression analysis across samples.

ADD REPLYlink written 13 months ago by Kevin Blighe45k
1
gravatar for zzgw
11 months ago by
zzgw10
zzgw10 wrote:

TPM for all transcripts in a sample shall add up to 1 million. I don't see Renesh's formular for TPM will translate into the above key feature of TPM. the 'read length' in the formular is confusing. this is also contradictory to what is stated in other posts (simply by google search of TPM rnaseq) and Lior's paper.

ADD COMMENTlink written 11 months ago by zzgw10

Thank you for your comments and concerns. I have updated the formulae for clear understanding.

ADD REPLYlink written 11 months ago by Renesh1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 635 users visited in the last hour