I am currently using TCGA-Assembler to download TCGA data. I am interested in the RPKM and RSEM values for each gene. For RSEM i can easily obtain the values per gene (around 20k values per sample). For RPKM, on the other hand, I can only obtain values regarding each exon (around 230k values per sample), not gene. My questions are:
Does anyone know if TCGA provides RPKM values per gene?
If not, given that I can map each exon to a gene (TCGA has a mapping for that), is it easy (or even possible?) to obtain the RPKM values per gene, as for the RSEM?
I have been having the same issue, as I imagine many others have. I am working on colon cancer, and a recent nature genetics paper from Isella et al on subtyping/classification gives their full method for conversion and comparison of the RSEM and RPKM values available in RNAseqV1/V2. It did require comparison of samples found in both V1 and V2, so that might be a barrier if not available for you. If you work on CRC then you are in luck as they have made a Bioconductor package containing all samples with "converted" RPKMs. Otherwise you would most likely be doing your community a huge favour converting the data.