Question: Calculate TPM from cuffnorm count data
2
gravatar for galozs
4 months ago by
galozs20
galozs20 wrote:

Hi,

I am working with RNA-seq data and normalizing with Cufflinks' Cuffnorm, whose output is count and FPKM matrices. since I'm comparing gene expression between samples, I wanted to use TPM rather than FPKM. I was wondering about few issues: 1- Can I calculate TPM values directly from the FPKM values of Cuffnorm? 2- Can I calculate TPM from Cuffnorm's count table? Is it raw enough? 3- What is unique in Cuffnorm's FPKM vs. regular FPKM?

Thanks!

ADD COMMENTlink modified 4 months ago by Macspider2.4k • written 4 months ago by galozs20
0
gravatar for Macspider
4 months ago by
Macspider2.4k
Vienna - BOKU
Macspider2.4k wrote:

To calculate TPM from FPKM use the formula proposed by Pachter et al, 2011, here:

enter image description here

To fully understand this formula: What the FPKM?

ADD COMMENTlink written 4 months ago by Macspider2.4k

Thanks! Do you know if it is OK to do it with the FPKM output values of Cuffnorm? I used the formula that you mentioned and got very wired results when calculated correlation matrix. So I thought that maybe Cuffnorm's FPKM are already manipulated in a way...

ADD REPLYlink written 4 months ago by galozs20

From every software output you compute the FPKM, you should retrieve the same number, if you start from the same counts, otherwise that's some great bug in data science. The sum of the FPKM on the denominator of the fraction is belonging to the sample, not the whole data set. Does that help? Did you multiply by 10 to the power of 6? Many times I forget to do that...

ADD REPLYlink modified 4 months ago • written 4 months ago by Macspider2.4k

That what I was thinking though the end values where very strange. (I checked that all the steps were done correctly according to the formula) these are pearson correlations of the log2 FPKM data: these are pearson correlations of the log2 FPKM data:

and these are the pearson correlations of the log2 TPM: and these are the pearson correlations of the log2 TPM: as you can see the TPM looks realy bad. Do you have an idea? where is my mistake?

ADD REPLYlink written 4 months ago by galozs20

TPM is not just a rearrangement of FPKM, it contains a sample-specific factor. Therefore, it might mean that your two samples are not that similar after all. However, you should still remove low values (scale the threshold from FPKM=1 to TPM=<val> for each of your samples).

ADD REPLYlink written 4 months ago by Macspider2.4k

What do you mean by "scale the threshold from FPKM=1 to TPM=<val> for each of your samples" ? both correlation matrix I attached were calculated separately on the FPKM / TPM log2 values. Do you mean that the noise threshold in is different for each one of the samples? For example if I used log2(FPKM) = 5 as a threshold than in TPM I will have a different one for each sample?

Thanks!

ADD REPLYlink written 4 months ago by galozs20

Exactly.

Please follow me through the formula that I posted "upstairs":

  • If you look carefully, you'll see that the fraction includes the FPKM value of gene i as numerator, and the sum of all the FPKM values of sample j as denominator.
  • This means that, for each sample, you'll have a different sum at the denominator

This being said, to obtain a sample-specific threshold, you will apply 5 as numerator and then calculate the sum of the FPKMs for each sample to divide 5 with. The fraction is then multiplied by 10 to the power of 6, giving you the TPM version of FPKM=5, relativized to the total sample expresssion, for each sample.

ADD REPLYlink modified 4 months ago • written 4 months ago by Macspider2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 957 users visited in the last hour