Question: TCGA survival analysis: continuous vs discrete expression values
0
gravatar for Mike
4 months ago by
Mike1.3k
UK
Mike1.3k wrote:

Hi ,

I am trying to do a survival analysis for a gene using TCGA data, I did this by both ways, continuous expression value and discrete values (Low and high using median expression values). In both cases there is huge difference in p-values. Can anyone help me which way is better for survival analysis?

my command:

coxph(Surv(time, status) ~ expression, data = survdata)

results:

HR=0.82,  logrankP= 0.02  (when I used discrete model)
HR= 0.87,  logrankP= 0.00001  (when I used continuous model)

Thanks

survival cox model coxph • 274 views
ADD COMMENTlink written 4 months ago by Mike1.3k
2
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe44k
South America | Europe | USA
Kevin Blighe44k wrote:

When you convert the data to discrete values, you are eliminating information, as I elaborate here in an extreme example: A: Why quantitative design are preferred GWAS approach In the process, you also make it more readily interpretive to the human brain. Simply using Low and High may be too few categories. You could try introducing more categories.

If your data is on the continuous scale, you need to be aware of the distribution that it follows and whether you have processed it correctly.

ADD COMMENTlink written 4 months ago by Kevin Blighe44k

Thanks Kevin, expression data is RSEM log2 and this is distribution.

https://ibb.co/pbWBV0g

Median expresssion values of this gene is 8.73 in 452 samples

ADD REPLYlink modified 4 months ago • written 4 months ago by Mike1.3k

What if you convert that logged data to Z-scores and then trichotomise it based on that?

ADD REPLYlink written 4 months ago by Kevin Blighe44k

nearly the same results using Z-scores data for discrete (logrankP= 0.02 ) & continuous model (logrankP= 0.00004 ).

ADD REPLYlink written 4 months ago by Mike1.3k
1

You should check hazard ratios too, and their confidence intervals. If, in one situation, the hazard ratio is 0.6 but the upper 95% limit passes 1.0, then that is not as reliable as a situation where the upper 95% is 0.8. Same is true for the reverse where the hazard ratio may be 2.9 but the lower 95% limit is below or maintained above (1.0).

That is: check that the hazard ratio limits don't cross the 'barrier' of 1.0. It's just a simple extra check.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe44k

Thanks again, yes there is difference in HRs with confidence intervals (upper/lower 95)

HR         HRlower   HRupper
0.82      0.770        1.01       (discrete)
0.87      0.75         0.97   (continues)
ADD REPLYlink written 4 months ago by Mike1.3k

Looking at that, I'd assume that continuous was more reliable. I think that it's okay to derive the p-value and HRs from the continuous variable and then just plot dichotomised variables in the survival plot. You just have to clearly state what you have done in the methods.

ADD REPLYlink written 4 months ago by Kevin Blighe44k
1

Thanks Kevin for your help, I found a relevant article on this issue.

Comparing continuous and discrete analyses of breast cancer survival information

https://www.sciencedirect.com/science/article/pii/S0888754316300684

ADD REPLYlink written 4 months ago by Mike1.3k

No problem.

ADD REPLYlink written 4 months ago by Kevin Blighe44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour