Question: TCGA RNA-Seq Data: RSEM and RPKM
gravatar for NetunoPoncã
5.5 years ago by
NetunoPoncã160 wrote:

Hi there,

I am currently using TCGA-Assembler to download TCGA data. I am interested in the RPKM and RSEM values for each gene. For RSEM i can easily obtain the values per gene (around 20k values per sample). For RPKM, on the other hand, I can only obtain values regarding each exon (around 230k values per sample), not gene. My questions are:

  1. Does anyone know if TCGA provides RPKM values per gene?
  2. If not, given that I can map each exon to a gene (TCGA has a mapping for that), is it easy (or even possible?) to obtain the RPKM values per gene, as for the RSEM?


Ps.: I already read the topic Calculating Rpkm From Rsem Using Tcga Rnaseqv2 Level3 Data.


ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 5.5 years ago by NetunoPoncã160
gravatar for poisonAlien
5.5 years ago by
poisonAlien2.8k wrote:

Yes. TCGA does provide RPKM per gene per sample. Level-3 RNA-seq has 3 types of quantification files per sample :

1. exon.quantification


2.spljxn .quantification


I am not sure about TCGA-assembler but you can download them at TCGA data-portal.

gene.quantification has raw counts, median length and corresponding RPKM for each gene.

ADD COMMENTlink written 5.5 years ago by poisonAlien2.8k


I can find this for RNASeqV1 data, but not for RNASeqV2 data. Any ideas? Furthermore, only a couple of data have this available for RNASeqV1 data... Is it possible to obtain this for RNASeqV2 data?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by NetunoPoncã160

I am not sure about the second part. RNAseqV2 are the results from using different processing pipeline. So I don't think you would get gene wise RPKM (instead you have rsem.genes.normalized_results). May be this link will be helpful to you.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by poisonAlien2.8k

I already saw that page, but thanks for citing it. I realize that they do not provide RPKM values per gene, but I guessed it would be "easy" to derive these values from other information available in the files... I guess it is not the case.

ADD REPLYlink written 5.5 years ago by NetunoPoncã160
gravatar for kangyueapril
5.1 years ago by
United States
kangyueapril80 wrote:

Download TCGA data in this wedsite: Then download the file mRNAseq_Preprocess.Level_3.2014xxxx00.0.0.tar.gz. Unzip and you will find the files contain RPKM data. 

ADD COMMENTlink written 5.1 years ago by kangyueapril80
gravatar for bruce.moran
5.1 years ago by
bruce.moran720 wrote:

I have been having the same issue, as I imagine many others have. I am working on colon cancer, and a recent nature genetics paper from Isella et al on subtyping/classification gives their full method for conversion and comparison of the RSEM and RPKM values available in RNAseqV1/V2. It did require comparison of samples found in both V1 and V2, so that might be a barrier if not available for you. If you work on CRC then you are in luck as they have made a Bioconductor package containing all samples with "converted" RPKMs. Otherwise you would most likely be doing your community a huge favour converting the data.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by bruce.moran720

Thanks for the hint. But where did you see the conversion from RSEM to RPKM? I only found the part where they try to match the samples generated using the GA and HiSeq platforms but they are both already in RSEM (i.e. TCGA v2). Do you see what I mean?

ADD REPLYlink written 4.6 years ago by leshaker0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1123 users visited in the last hour