First off, do not log FPKMs. An explanation of why not to do so is provided here: (see 25:50 - 29:10).

Second off, metrics like upper-quartile normalization of FPKM or TPM (TPM is better than FPKM by the way) doesn't fix problems with between-samples comparisons.
A better way is to use `DESeq2`

to normalize the data. DESeq2 has a `vst`

function that normalizes your count data and corrects heteroscedasticity (i.e. corrects for the fact that genes with higher average expression have higher variances) on a log2-scale. You can use `DESeq2`

on raw RNA-seq counts (which are obtainable from GDC).

Third, (without playing around with the actual expression & survival data on my own), I don't have a perfect explanation why your HR's are close to 1, but here are some ideas. Cox regression assumes a linear relationship between the log Hazard and your variable (expression). (You can check whether this assumption holds by analyzing the residuals.) Hence, this is why log2 would fit much better for count data (which, otherwise, is Poisson or Negative Binomially distributed).

•

link
modified 6 months ago
•
written
6 months ago by
dsull • **1.2k**