Question

How to calculate TPM?

3

Entering edit mode

7.5 years ago

moransharo ▴ 30

Hello, I'm new to RNA-seq and normalization... I would like to normalize raw RNA-seq data to TPM. I'm familiar with the logic behind it (thanks to the blog: http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/) However, I can't find a way to calculate it via R packages. I read that RSEM may be helpful, but I'm really not sure how to install and use it. It will be highly appreciated if someone could help... Thank you!

PKM RNA-Seq R • 18k views

ADD COMMENT • link updated 7.5 years ago by Farbod ★ 3.4k • written 7.5 years ago by moransharo ▴ 30

score 1 · Answer 1 · 2016-10-13

Warning: I don't do RNA-seq often; my comments below may be inaccurate.

I was looking at an RNA-seq data set where only FPKM is provided. As I need raw read counts for edgeR-like analyses, I did a small research on how FPKM and the related TPM are calculated. I have also consulted Rob Patro for help. In the end, it seems to me that there are multiple subtly different ways to compute FPKM and TPM. FPKM/etc computed by different tools are often not comparable.

I think the most precise description of FPKM/etc is here. Importantly, to derive FPKM/etc from raw read counts, we need to compute the effective transcript length (the \tilde{l} in the link above). The exact approach to computing this value is tool dependent. Rob commented that:

A different approach [to computing effective length] (which is used in Salmon and kallisto) is to define the effective length of a transcript as L - \mu_{L}, where \mu_{L} is the mean of the fragment length distribution for all fragments of length <= L.

and mentioned that "the effective length can also be modified to account for sampling biases". There is not a single way to compute effective length and thus not a single way to compute FPKM/TPM.

As a side note, GTEx provided both raw counts and FPKM. I was trying to convert from counts to FPKM. However, it seems that GTEx is using an effective length longer than the transcript length, which would be impossible with Rob's formula or the formula in the link above...

In all, your question is not only about how to compute FPKM/TPM, but is also related to which flavor of FPKM/TPM to compute. If I were given such a task, I would take Rob's formula to compute effective length and the TPM formula in the linked webpage. Note that you need to know the insert size/fragment length distribution of your library in order to compute TPM accurately.

score 0 · Answer 2 · 2016-10-13

Here: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/ there's a comparison between different manners of normalization, including a demo R script to convert read counts to TPMs and FPKMs without installing RSEM or edgeR. By the way, some programs like eXpress calculate TPMs and FPKMs while counting the mapped reads, so you won't need any further conversions.

There are other answers in this old thread: Using transcripts per million (TPM)

score 0 · Answer 3 · 2016-10-13

0

Entering edit mode

7.5 years ago

WouterDeCoster 47k

Depending on why you need these TPM values, a solution would be to use commonly accepted tools like DESeq2 which do a more sophisticated normalization, and use these normalized counts instead of TPM. I see Israel Barrantes already suggested something similar.

ADD COMMENT • link 7.5 years ago by WouterDeCoster 47k

score 0 · Answer 4 · 2016-10-13

I am not sure what you mean by "raw". The other two answers are excellent, but assume you already have the counts, which means the data is partially processed.

If by "raw", you mean you have FASTQs, the easiest way to get TPMs is probably to use Kallisto, which requires only a single step: https://pachterlab.github.io/kallisto/manual . However, it's not much easier than RSEM, so if you had trouble using that, I am not sure what to recommend.

I don't think there is currently a method to get from FASTQs to TPMs entirely in R.

score 0 · Answer 5 · 2016-10-13

0

Entering edit mode

7.5 years ago

Farbod ★ 3.4k

Dear moransharo, Hi

In this Trinity explanation of transcript quantification, you can find the TPM in the RSEM output and the related script.

Hope that helps

~ Best

ADD COMMENT • link 7.5 years ago by Farbod ★ 3.4k