Question: Converting TCGA expression data from FPKM to TPM
gravatar for Alex Reynolds
2.9 years ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

For a given cancer type in the NIH Cancer Genome Atlas, I visit the data portal and download UNC RNASeqV2, level 3 expression data. Specifically, I grab files that end with the extension *.rsem.genes.normalized_results

Each file contains one line per gene, with the gene name and (I assume) its normalized FPKM expression value. I am assuming these data are normalized FPKM based on the filename and the UNC RNASeqV2 protocol description hosted on TCGA.

My questions are:

1. Are these expression data really measured in FPKM?
2. If they are, how should I convert from FPKM to TPM, for all the expression values for a given gene?

expression rna-seq tpm tcga fpkm • 8.3k views
ADD COMMENTlink modified 11 months ago by fabio-verdao0 • written 2.9 years ago by Alex Reynolds25k

You can't recover TPMs from gene-level FPKMs.  The data on transcripts has already been lost.

ADD REPLYlink written 2.9 years ago by gc20

I don't understand your comment. I've quickly compared the FPKMs for a given gene and it's transcripts and noticed (as one could expect) that the gene-level FPKM is the sum of all FPKM of it's transcripts. So it would not really make a difference if you calculate the TPM from gene or transcript-level FPKMs, I conclude. Hereafter one example:

genes.fpkm_tracking:ENSG00000196092    ENSG00000196092    PAX5    29.5427

isoforms.fpkm_tracking:ENST00000358127    ENSG00000196092    PAX5    5.41329
isoforms.fpkm_tracking:ENST00000520154    ENSG00000196092    PAX5    2.55302e-10
isoforms.fpkm_tracking:ENST00000523241    ENSG00000196092    PAX5    2.06415e-12
isoforms.fpkm_tracking:ENST00000377840    ENSG00000196092    PAX5    8.02239e-16
isoforms.fpkm_tracking:ENST00000377852    ENSG00000196092    PAX5    0.561218
isoforms.fpkm_tracking:ENST00000377853    ENSG00000196092    PAX5    5.90173e-10
isoforms.fpkm_tracking:ENST00000523145    ENSG00000196092    PAX5    2.63708e-10
isoforms.fpkm_tracking:ENST00000446742    ENSG00000196092    PAX5    0.387949
isoforms.fpkm_tracking:ENST00000520281    ENSG00000196092    PAX5    0.00123482
isoforms.fpkm_tracking:ENST00000377847    ENSG00000196092    PAX5    20.8611
isoforms.fpkm_tracking:ENST00000522003    ENSG00000196092    PAX5    0.474374
isoforms.fpkm_tracking:ENST00000414447    ENSG00000196092    PAX5    0.995077
isoforms.fpkm_tracking:ENST00000523493    ENSG00000196092    PAX5    0.663389
isoforms.fpkm_tracking:ENST00000524340    ENSG00000196092    PAX5    1.70926e-82
isoforms.fpkm_tracking:ENST00000522932    ENSG00000196092    PAX5    0
isoforms.fpkm_tracking:ENST00000520083    ENSG00000196092    PAX5    0.185101


SUM of transcript-level FPKMs. 29.5427328211
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Michi940

Are you sure you need the TPM (Transcripts Per Million) data? If you are fine with the data at the gene level you should be OK as it is

ADD REPLYlink written 2.9 years ago by roy.granit740

I'd like the TPM data, if possible.

ADD REPLYlink written 2.9 years ago by Alex Reynolds25k

Quick question. If I have between sample normalized FPKMs, do I just sum the FPKMs of all the transcripts for a given gene within a sample, or do I sum all of those and for all those transcripts in the other samples. I'm just thinking, if you have three transcripts and two samples, that is different maths.

ADD REPLYlink written 2.4 years ago by james.lloyd80

You need to post this as a new question and refer back to this thread if necessary. Each thread starts with a question followed by answers - new questions should not be posted in the answer section. That's what makes this site better than others. (Moderation: your answer will be moved to a comment)

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Istvan Albert ♦♦ 77k
gravatar for h.mon
2.9 years ago by
h.mon18k wrote:

At the end of this blog post, a simple formula is provided to compute TPM from FPKM:

TPMi=( FPKMi / sum(FPKMj ) * 10^6

edit: well, from the protocol you linked, and also from this wiki, the UNC V2 RNA-Seq Workflow uses MapSplice+RSEM, so I guess measures are already given as TPM - check here and here.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by h.mon18k

Thanks, the wiki link was a much better summary than what I had found previously.

ADD REPLYlink written 2.9 years ago by Alex Reynolds25k
gravatar for fabio-verdao
11 months ago by
fabio-verdao0 wrote:

Maybe it's a little bit old, but just for future access...

@h.mon answer your second question.

For your first question: 1. Are these expression data really measured in FPKM?

Following the wiki cited by @h.mon, *.rsem.genes.normalized_results as well as *.rsem.isoforms.normalized_results have measures in normalized_count (upper quartile normalized RSEM count estimates) and not RPKM, FPKM or TPM.

ADD COMMENTlink written 11 months ago by fabio-verdao0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1472 users visited in the last hour