Question

What can upper quartile normalized RSEM count estimates and scaled estimates(TPM) in TCGA be used for?

0

Entering edit mode

7.6 years ago

medsci666 • 0

I am learning the analysis of TCGA data. I realized I can get 3 kinds of RNA expression level information, raw count, scaled estimates(TPM) and upper quartile normalized RSEM count estimates. I am confused that which kind of data I shoud choose. For example, if I want to explore the correlation between RNA expression and clinical feature, which kind of data should be choosed? What can upper quartile normalized RSEM count estimates and scaled estimates(TPM) be used for respectively? Thank you!

RNA-Seq • 3.7k views

ADD COMMENT • link updated 7.6 years ago by Kevin Blighe 89k • written 7.6 years ago by medsci666 • 0

score 0 · Answer 1 · 2017-12-16

My own golden rule in bioinformatics and data analysis: Always aim to get the data in its most raw form possible.

Apart from the fact that TPM and upper-quartile normalisation methods have been found to be not ideal, obtaining data in its most raw form in this situation will confer maximum control to you in terms of how you analyse the data. Granted, in time-pressure situations, this may not be ideal. Someone else may chirp in here and say that there are 100s of publications where these types of normalised counts have been used, but something being published doesn't allude to its quality at all, even if its a top tier journal. There are 1000s of GWAS studies published, for example, the vast proportion of whose results are not reproducible

Obtaining raw RNA-seq counts is neither an issue in terms of data processing anymore, because we now have super-rapid pseudo-aligners at our disposal, such as Kallisto and Salmon, which can process >500 samples in just a couple of days.

So, raw counts.

Kevin