Question: How to calculate TPM?
2
gravatar for moransharo
2.9 years ago by
moransharo20
moransharo20 wrote:

Hello, I'm new to RNA-seq and normalization... I would like to normalize raw RNA-seq data to TPM. I'm familiar with the logic behind it (thanks to the blog: http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/) However, I can't find a way to calculate it via R packages. I read that RSEM may be helpful, but I'm really not sure how to install and use it. It will be highly appreciated if someone could help... Thank you!

rna-seq pkm R • 8.3k views
ADD COMMENTlink modified 2.9 years ago by Farbod3.3k • written 2.9 years ago by moransharo20
1
gravatar for lh3
2.9 years ago by
lh331k
United States
lh331k wrote:

Warning: I don't do RNA-seq often; my comments below may be inaccurate.

I was looking at an RNA-seq data set where only FPKM is provided. As I need raw read counts for edgeR-like analyses, I did a small research on how FPKM and the related TPM are calculated. I have also consulted Rob Patro for help. In the end, it seems to me that there are multiple subtly different ways to compute FPKM and TPM. FPKM/etc computed by different tools are often not comparable.

I think the most precise description of FPKM/etc is here. Importantly, to derive FPKM/etc from raw read counts, we need to compute the effective transcript length (the \tilde{l} in the link above). The exact approach to computing this value is tool dependent. Rob commented that:

A different approach [to computing effective length] (which is used in Salmon and kallisto) is to define the effective length of a transcript as L - \mu_{L}, where \mu_{L} is the mean of the fragment length distribution for all fragments of length <= L.

and mentioned that "the effective length can also be modified to account for sampling biases". There is not a single way to compute effective length and thus not a single way to compute FPKM/TPM.

As a side note, GTEx provided both raw counts and FPKM. I was trying to convert from counts to FPKM. However, it seems that GTEx is using an effective length longer than the transcript length, which would be impossible with Rob's formula or the formula in the link above...

In all, your question is not only about how to compute FPKM/TPM, but is also related to which flavor of FPKM/TPM to compute. If I were given such a task, I would take Rob's formula to compute effective length and the TPM formula in the linked webpage. Note that you need to know the insert size/fragment length distribution of your library in order to compute TPM accurately.

ADD COMMENTlink written 2.9 years ago by lh331k
0
gravatar for Israel Barrantes
2.9 years ago by
Germany
Israel Barrantes740 wrote:

Here: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/ there's a comparison between different manners of normalization, including a demo R script to convert read counts to TPMs and FPKMs without installing RSEM or edgeR. By the way, some programs like eXpress calculate TPMs and FPKMs while counting the mapped reads, so you won't need any further conversions.

There are other answers in this old thread: Using transcripts per million (TPM)

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Israel Barrantes740
0
gravatar for WouterDeCoster
2.9 years ago by
Belgium
WouterDeCoster40k wrote:

Depending on why you need these TPM values, a solution would be to use commonly accepted tools like DESeq2 which do a more sophisticated normalization, and use these normalized counts instead of TPM. I see Israel Barrantes already suggested something similar.

ADD COMMENTlink written 2.9 years ago by WouterDeCoster40k
0
gravatar for igor
2.9 years ago by
igor8.1k
United States
igor8.1k wrote:

I am not sure what you mean by "raw". The other two answers are excellent, but assume you already have the counts, which means the data is partially processed.

If by "raw", you mean you have FASTQs, the easiest way to get TPMs is probably to use Kallisto, which requires only a single step: https://pachterlab.github.io/kallisto/manual . However, it's not much easier than RSEM, so if you had trouble using that, I am not sure what to recommend.

I don't think there is currently a method to get from FASTQs to TPMs entirely in R.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by igor8.1k
0
gravatar for Farbod
2.9 years ago by
Farbod3.3k
Toronto
Farbod3.3k wrote:

Dear moransharo, Hi

In this Trinity explanation of transcript quantification, you can find the TPM in the RSEM output and the related script.

Hope that helps

~ Best

ADD COMMENTlink written 2.9 years ago by Farbod3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour