I have just started playing with some RSEM RNA-seq data from the TCGA. To get to know the data better, I am running some exploratory analyses/sanity checks "for fun". One observation that really surprised me (particularly coming from a microarray world where everything is quantile normalized) is that when I order tumor/normal samples from the same tissue background by their sum of log2(TPM+1) across all genes, the normals will frequently cluster either at the bottom or at the top of the list. This happens in some, but not in all data sets. E.g. the effect is really pronounced for LIHC, but not for BRCA.
This phenomenon seems a bit disconcerting, and I do not understand its cause. Any ideas/explanations would be much appreciated!
Thanks a lot in advance.