Significant but very small beta coefficients in Cox proportional hazards calculation (survival)
3.8 years ago
atakanekiz

Hello CV community,

I'm analyzing TCGA data to investigate the effects of lncRNAs on survival. Among other things, I wanted to calculate a univariate CoxPH model for each gene to find genes whose expression levels have a significant correlation with survival outcome. I realize that I'm not testing the CoxPH assumptions for each gene, but I'm not sure if it has a lot to do with my question here.

I have two main questions:

1: Out of ~12000 lncRNA genes found in my dataset, around ~1600 was found to be significantly associated with survival outcome (p<0.05). However, the majority of these genes have very low beta coefficients and concomitantly HR values very close to 1 (see below a histogram of beta and HR values). HR values close to 1 indicates no big effect on the clinical outcome.


I was curious to see if these genes really have a dismal effect on survival by plotting KM curves. Here I'm showing two example genes that were selected due to their low p-values in CoxPH model with the following details:


USP30-AS1 -0.0056929 0.99432 18.66 1.558e-05

AC018553.1 0.0033293 1.00330 28.54 9.192e-08

I categorized the expression as high and low at the median expression value in all the patients. Here are the KM curves:

enter image description here

To me, it is a bit weird to have genes whose high vs low expression correlates with a very clear separation in the survival curves, while coxPH model predicts a tiny effect on the survival outcome. Can somebody explain what I'm missing here?

2: Few genes (with p<0.05) had extreme HR values (>200 and in one case 12360!). Upon closer inspection, I noticed that these genes are only expressed in 1-5 patients in the cohort of 457 total patients. I wouldn't have thought that CoxPH model would find rare genes like this significant (even though potentially the expression of these genes can correlate with poor clinical outcome in all of these 1-5 patients). Can somebody enlighten me about why these genes are produced at the end of CoxPH as significant 'hits'?

Thank you very much


survival RNA-Seq cox HR survival analysis • 1.6k views
3.8 years ago
atakanekiz

I found out what the problem was. My gene expression data were in linear scale. When I transformed it to log scale, hazard values appeared more like what I expected. P-values were also an order of magnitude smaller due to lower variation in log-scale. Hopefully this will be helpful for somebody having a similar issue.


