Question: TCGA survival analysis: continuous vs discrete expression values
0
gravatar for Mike
21 months ago by
Mike1.6k
UK
Mike1.6k wrote:

Hi ,

I am trying to do a survival analysis for a gene using TCGA data, I did this by both ways, continuous expression value and discrete values (Low and high using median expression values). In both cases there is huge difference in p-values. Can anyone help me which way is better for survival analysis?

my command:

coxph(Surv(time, status) ~ expression, data = survdata)

results:

HR=0.82,  logrankP= 0.02  (when I used discrete model)
HR= 0.87,  logrankP= 0.00001  (when I used continuous model)

Thanks

survival cox model coxph • 998 views
ADD COMMENTlink written 21 months ago by Mike1.6k
2
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

When you convert the data to discrete values, you are eliminating information, as I elaborate here in an extreme example: A: Why quantitative design are preferred GWAS approach In the process, you also make it more readily interpretive to the human brain. Simply using Low and High may be too few categories. You could try introducing more categories.

If your data is on the continuous scale, you need to be aware of the distribution that it follows and whether you have processed it correctly.

ADD COMMENTlink written 21 months ago by Kevin Blighe67k

Thanks Kevin, expression data is RSEM log2 and this is distribution.

https://ibb.co/pbWBV0g

Median expresssion values of this gene is 8.73 in 452 samples

ADD REPLYlink modified 21 months ago • written 21 months ago by Mike1.6k

What if you convert that logged data to Z-scores and then trichotomise it based on that?

ADD REPLYlink written 21 months ago by Kevin Blighe67k

nearly the same results using Z-scores data for discrete (logrankP= 0.02 ) & continuous model (logrankP= 0.00004 ).

ADD REPLYlink written 21 months ago by Mike1.6k
1

You should check hazard ratios too, and their confidence intervals. If, in one situation, the hazard ratio is 0.6 but the upper 95% limit passes 1.0, then that is not as reliable as a situation where the upper 95% is 0.8. Same is true for the reverse where the hazard ratio may be 2.9 but the lower 95% limit is below or maintained above (1.0).

That is: check that the hazard ratio limits don't cross the 'barrier' of 1.0. It's just a simple extra check.

ADD REPLYlink modified 21 months ago • written 21 months ago by Kevin Blighe67k

Thanks again, yes there is difference in HRs with confidence intervals (upper/lower 95)

HR         HRlower   HRupper
0.82      0.770        1.01       (discrete)
0.87      0.75         0.97   (continues)
ADD REPLYlink written 21 months ago by Mike1.6k

Looking at that, I'd assume that continuous was more reliable. I think that it's okay to derive the p-value and HRs from the continuous variable and then just plot dichotomised variables in the survival plot. You just have to clearly state what you have done in the methods.

ADD REPLYlink written 21 months ago by Kevin Blighe67k
1

Thanks Kevin for your help, I found a relevant article on this issue.

Comparing continuous and discrete analyses of breast cancer survival information

https://www.sciencedirect.com/science/article/pii/S0888754316300684

ADD REPLYlink written 21 months ago by Mike1.6k

No problem.

ADD REPLYlink written 21 months ago by Kevin Blighe67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1085 users visited in the last hour