First off, do not log FPKMs. An explanation of why not to do so is provided here: (see 25:50 - 29:10).
Second off, metrics like upper-quartile normalization of FPKM or TPM (TPM is better than FPKM by the way) doesn't fix problems with between-samples comparisons.
A better way is to use
DESeq2 to normalize the data. DESeq2 has a
vst function that normalizes your count data and corrects heteroscedasticity (i.e. corrects for the fact that genes with higher average expression have higher variances) on a log2-scale. You can use
DESeq2 on raw RNA-seq counts (which are obtainable from GDC).
Third, (without playing around with the actual expression & survival data on my own), I don't have a perfect explanation why your HR's are close to 1, but here are some ideas. Cox regression assumes a linear relationship between the log Hazard and your variable (expression). (You can check whether this assumption holds by analyzing the residuals.) Hence, this is why log2 would fit much better for count data (which, otherwise, is Poisson or Negative Binomially distributed).
modified 6 months ago
6 months ago by
dsull • 1.2k