TCGA RNA-Seq Data: RSEM and RPKM
3
6
Entering edit mode
7.1 years ago
NetunoPoncã ▴ 160

Hi there,

I am currently using TCGA-Assembler to download TCGA data. I am interested in the RPKM and RSEM values for each gene. For RSEM i can easily obtain the values per gene (around 20k values per sample). For RPKM, on the other hand, I can only obtain values regarding each exon (around 230k values per sample), not gene. My questions are:

1. Does anyone know if TCGA provides RPKM values per gene?
2. If not, given that I can map each exon to a gene (TCGA has a mapping for that), is it easy (or even possible?) to obtain the RPKM values per gene, as for the RSEM?

Thanks!

Ps.: I already read the topic Calculating Rpkm From Rsem Using Tcga Rnaseqv2 Level3 Data

RSEM RPKM TCGA-Assembler RNA-Seq TCGA • 22k views
3
Entering edit mode
7.1 years ago
poisonAlien ★ 3.1k

Yes. TCGA does provide RPKM per gene per sample. Level-3 RNA-seq has 3 types of quantification files per sample :

1. exon.quantification
2. gene.quantification
3. spljxn .quantification

I am not sure about TCGA-assembler but you can download them at TCGA data-portal.

gene.quantification has raw counts, median length and corresponding RPKM for each gene.

0
Entering edit mode

Hi,

I can find this for RNASeqV1 data, but not for RNASeqV2 data. Any ideas? Furthermore, only a couple of data have this available for RNASeqV1 data... Is it possible to obtain this for RNASeqV2 data?

0
Entering edit mode

I am not sure about the second part. RNAseqV2 are the results from using different processing pipeline. So I don't think you would get gene wise RPKM (instead you have rsem.genes.normalized_results). Maybe this link will be helpful to you.

0
Entering edit mode

I already saw that page, but thanks for citing it. I realize that they do not provide RPKM values per gene, but I guessed it would be "easy" to derive these values from other information available in the files... I guess it is not the case.

3
Entering edit mode
6.7 years ago
kangyueapril ▴ 80

Download TCGA data in this website. Then download the file mRNAseq_Preprocess.Level_3.2014xxxx00.0.0.tar.gz. Unzip and you will find the files contain RPKM data.

2
Entering edit mode
6.8 years ago
bruce.moran ▴ 880

I have been having the same issue, as I imagine many others have. I am working on colon cancer, and a recent nature genetics paper from Isella et al on subtyping/classification gives their full method for conversion and comparison of the RSEM and RPKM values available in RNAseqV1/V2. It did require comparison of samples found in both V1 and V2, so that might be a barrier if not available for you. If you work on CRC then you are in luck as they have made a Bioconductor package containing all samples with "converted" RPKMs. Otherwise you would most likely be doing your community a huge favour converting the data.

0
Entering edit mode

Thanks for the hint. But where did you see the conversion from RSEM to RPKM? I only found the part where they try to match the samples generated using the GA and HiSeq platforms but they are both already in RSEM (i.e. TCGA v2). Do you see what I mean?