I would appreciate if you could help me analyzing the TCGA data. What I have done so far:
-Download PRAD (prostate) RNAseqv2 data consisting of 550 patients
-Download Clinical Data for PRAD
-Match these in Excel using the barcode
Now my question is if I should use the "raw_count" or "scaled_estimate" for my analysis. For example, I want to see the differential expression of EGFR in No tumor group vs with tumor group. Can I make an average of "raw_count" and compare the two groups? Or should I do some sort of a transformation? Or scaled_estimate multiplied by 10E6 is more accurate? The numbers of scaled_estimate is very very low like 2-10*10E-5, does it mean that the gene is not getting transcribed that much?
Sorry for me being naive in this field. But I thank any ideas and comments