Convert RPKM values to TPM values?
1
1
Entering edit mode
6.8 years ago
tsangj15 ▴ 30

I found normal tissue data that has RPKM values from medicalgenomics.org and tumor data from cbioportal. I want to convert these RPKM values to TPM values so I can calculate fold change between the normal tissue data and tumor data.

How do I do that? Is there a formula?

RNA-Seq • 6.4k views
ADD COMMENT
0
Entering edit mode
6.8 years ago
h.mon 35k

edit: All the information bellow is correct, but unimportant. If all your tumor samples are from one source (or several, as I did not look carefully into the sites you cited), and all your normal samples are from another, you cannot calculate meaningful fold-changes between them. Mathematically you can, but you won't know if the differences are due to tumor / normal tissues (what you want), or differences in lab protocols, sample prep kits, sequencing technology, and so forth.

Besides, if you do not understand what sample is regarding the RNAseq experiments you want to analyse, you should read more papers (1) / tutorials (2, 3, 4, 5) / courses.

Sorry to be so blunt, hope it doesn't put you off.

See this answer for a formula, this link for a video (youngsters nowadays seems to prefer video than text), this repo for a python script, and here for a R script.

A very useful link for this kind of stuff is https://www.google.com

ADD COMMENT
0
Entering edit mode

I found this equation.

TPM = FPKM / (sum of FPKM over all genes/transcripts) * 10^6

Is the sum of FPKM over all genes mean the sum of all FPKM values in one sample or across all samples?

ADD REPLY
0
Entering edit mode

TPM (as FPKM) is a within-sample normalization, so sum of FPKM from one sample.

edit: I think you got the formula wrong, shouldn't it be:

TPM = ( FPKM / sum of FPKM over all genes/transcripts) * 10^6
ADD REPLY
0
Entering edit mode

Sorry that was a poorly worded question.

I meant is the sum of RPKM over all genes mean the sum of all RPKM values from one transcript rather than all transcripts?

The text file I have has a bunch of these lines that show RPKM values for each transcript:

for example:

entrez_gene_id: ENSG00000169136 ensembl_gene_id hgnc_symbol: ATF5 transcript:NM_012068 transcript_length:2259 adipose: 2.016 colon: 1.685 heart: 2.113 hypothalmus: 1.057 kidney: 2.347 liver: 78.892 lung :1.5 ovary:1.948 skeletalmuscle: 1.093 spleen: 1.911 testes:2.485

Do I add all these numbers up for the sum of all RPKM values?

ADD REPLY
0
Entering edit mode

Imagine you have only one sample, and you have the FPKM values for each transcript from this sample. So the TPM for transcript i is:

TPM(i) = ( FPKM(i) / sum ( FPKM  all transcripts ) ) * 10^6

If you have several samples s, for each sample you sum the FPKM values from that particular sample, not from other samples or from all samples.

TPM(si) = ( FPKM(si) / sum ( FPKM(s)  all transcripts ) ) * 10^6
ADD REPLY
0
Entering edit mode

Sorry I am new to this.

What exactly do you mean by sample?

ADD REPLY

Login before adding your answer.

Traffic: 2048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6