Question: TCGA data analysis, raw_count?!
gravatar for juara
3.9 years ago by
juara10 wrote:


I would appreciate if you could help me analyzing the TCGA data. What I have done so far:

-Download PRAD (prostate) RNAseqv2 data consisting of 550 patients

-Download Clinical Data for PRAD

-Match these in Excel using the barcode

Now my question is if I should use the "raw_count" or "scaled_estimate" for my analysis. For example, I want to see the differential expression of EGFR in No tumor group vs with tumor group. Can I make an average of "raw_count" and compare the two groups? Or should I do some sort of a transformation? Or scaled_estimate multiplied by 10E6 is more accurate? The numbers of scaled_estimate is very very low like 2-10*10E-5, does it mean that the gene is not getting transcribed that much?

Sorry for me being naive in this field. But I thank any ideas and comments


rna-seq tcga R • 2.5k views
ADD COMMENTlink modified 3.9 years ago by roy.granit790 • written 3.9 years ago by juara10
gravatar for roy.granit
3.9 years ago by
roy.granit790 wrote:

You can read more about the TCGA data types here. But basically the raw_counts is the total number of counts for that gene, while the scaled_estimate is the relative fraction of reads for that gene. Notice that you also have the 'normalized_counts' data, which is the transformation of the raw data with the 75th percentile of that column.

I believe that most people take the normalized counts, log2 transform them, and then compare between samples. This way you actually internally normalize the data and can compare different samples without further normalization. 

I would recommend two very useful tools that will save you much time handling the data without tedious spreadsheet work:

1. - cancer browser

2. - cBioPortal 

Both tools allow you to analyze the TCGA data very easily. 

ADD COMMENTlink written 3.9 years ago by roy.granit790
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1480 users visited in the last hour