Question

Questions about survival analysis

0

Entering edit mode

5.8 years ago

tujuchuanli ▴ 100

Hi, all

I want to identify prognostic genes by survival analysis in TCGA BRCA dataset. Here I basicly followed the way of a previous study (https://peerj.com/articles/1499/). My plan is to analyze gene one by one and pick genes with significantly cox pvalue (p<0.05).

The survival model is below (using survival package in R)

coxmodel <- coxph(Surv(time,censor) ~ exprs)
summary.coxmodel <- summary(coxmodel)
coef <- coef(summary.coxmodel)[1]
coef.pvalue <- coef(summary.coxmodel)[5]

Here time is survival time. Censor is died or not died. exprs is gene expression value (RNA-seq data, RPKM value).

Then I want to display some of genes with significant cox pvalue by Kaplan plot. Basicly I fellowed post by Kevin (cox proportional hazard model, by the way, Kevin. I hope you can see this post and give me some precious suggestions). I use median of gene expression as cutoff to divide samples into two groups (group with high exprs and low exprs).

The plot give me a Logrank p-value, which is always much bigger than cox pvalue (usually 100 times, I try several genes).

My question is how I can get perfect plot to fit my cox pvalue? or I only have to try several cutoff to get best fitting plot?

survival analysis R • 2.0k views

ADD COMMENT • link 5.8 years ago by tujuchuanli ▴ 100

0

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, I will try it next time.

ADD REPLY • link 5.8 years ago by tujuchuanli ▴ 100

0

Entering edit mode

Your worry appears to be that the P values are just very different - is this correct? Are your sample numbers low in either of your groups being compared (or imbalanced?)?

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, Nice to see you again

Here, I have two groups of genes. I want to check the number of prognostic genes in group A and B (prognostic genes are defined as cox-pvalue<=0.05). Actually the percentage of prognostic genes in group A is higher than B (12% vs 6% in BRCA dataset).

This is an overview of data. Next I want to display some of genes by Kaplan plot and find above problem. Since I am just a newbie to survival analysis, I don`t know how to deal with it.

Can I say a gene is a prognostic genes even if Logrank p-value is not significant but cox-pvalue is signifcant?

By the way, Kevin. Could you please check my other two posts (C: question about identifying differential expressed genes in TCGA and https://www.biostars.org/p/327841/) and give me some suggestions? Your suggestions are very important to me.

Thanks.

ADD REPLY • link 5.8 years ago by tujuchuanli ▴ 100

0

Entering edit mode

Thanks, Kevin. I will check the post. Thank you again.

ADD REPLY • link 5.8 years ago by tujuchuanli ▴ 100

score 1 · Answer 1 · 2018-07-19

To 'intimately' understand the log rank and Cox proportional hazards tests, I would encourage you to post on https://stats.stackexchange.com/

From my general understanding: the log rank, Wald, and likelihood ratio tests are just comparing the different arms of your survival 'curve'. The Cox test, then, will do the same but take into account any adjustments that you are making in the model.

For example, we can build a Cox model and include various covariates in the model, such as smoking status, BMI, exposure to allergens, etc.. The Cox model will analyse the survival curves and 'adjust' for these covariates when reporting P vales and Hazard Ratios, whilst the log rank test will not. So, the log rank test can be misleading.