Significant but very small beta coefficients in Cox proportional hazards calculation (survival)
Entering edit mode
3.8 years ago
atakanekiz ▴ 300

Hello CV community,

I'm analyzing TCGA data to investigate the effects of lncRNAs on survival. Among other things, I wanted to calculate a univariate CoxPH model for each gene to find genes whose expression levels have a significant correlation with survival outcome. I realize that I'm not testing the CoxPH assumptions for each gene, but I'm not sure if it has a lot to do with my question here.

I have two main questions:

1: Out of ~12000 lncRNA genes found in my dataset, around ~1600 was found to be significantly associated with survival outcome (p<0.05). However, the majority of these genes have very low beta coefficients and concomitantly HR values very close to 1 (see below a histogram of beta and HR values). HR values close to 1 indicates no big effect on the clinical outcome.


I was curious to see if these genes really have a dismal effect on survival by plotting KM curves. Here I'm showing two example genes that were selected due to their low p-values in CoxPH model with the following details:


USP30-AS1 -0.0056929 0.99432 18.66 1.558e-05

AC018553.1 0.0033293 1.00330 28.54 9.192e-08

I categorized the expression as high and low at the median expression value in all the patients. Here are the KM curves:

enter image description here

To me, it is a bit weird to have genes whose high vs low expression correlates with a very clear separation in the survival curves, while coxPH model predicts a tiny effect on the survival outcome. Can somebody explain what I'm missing here?

2: Few genes (with p<0.05) had extreme HR values (>200 and in one case 12360!). Upon closer inspection, I noticed that these genes are only expressed in 1-5 patients in the cohort of 457 total patients. I wouldn't have thought that CoxPH model would find rare genes like this significant (even though potentially the expression of these genes can correlate with poor clinical outcome in all of these 1-5 patients). Can somebody enlighten me about why these genes are produced at the end of CoxPH as significant 'hits'?

Thank you very much


survival RNA-Seq cox HR survival analysis • 1.6k views
Entering edit mode
3.8 years ago
atakanekiz ▴ 300

I found out what the problem was. My gene expression data were in linear scale. When I transformed it to log scale, hazard values appeared more like what I expected. P-values were also an order of magnitude smaller due to lower variation in log-scale. Hopefully this will be helpful for somebody having a similar issue.


Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6