0
0
Entering edit mode
3.1 years ago
tujuchuanli ▴ 100

Hi, I want to perform survival analysis on TCGA dataset. I use “survival” package in R to do it. For each gene, the equation for the model is “coxph(Surv(time,censor) ~ exprs)”, where time is survival time (for dead patients) or last follow up time (for alive patients), censor is dead or alive (alive=0 and dead=1) for each cancer sample, and exprs is the gene expression value. I have about 1000 genes. So I do it for 1000 times.

I also try almost the same equation just changing censor from “alive=0 and dead=1” to “alive=1 and dead=0”. The p-value changes a lot. The number of significant genes is almost the same. But the overlapping of significant genes for these two options is quite small (~30%).

From my understanding, the code for alive or dead cannot affect anything. However, why does it affect the result?

survival analysis • 1.0k views
0
Entering edit mode

Did you read the help pages for coxph and Surv to see exactly how the variables passed to these should be encoded? At the console, type ?coxph and ?Surv. I even given an example here: Survival analysis with gene expression Be aware that there can be a World of difference between a number encoded as numeric and that coded as a factor.

0
Entering edit mode

Thanks, Kevin I do read the help page of "?surv". It recommands alvie=0 and dead=1. I just want to know why. I am reading your post, Thanks~~

1
Entering edit mode

Hey again. I do not really see your point of view... I mean, the survival of the patient is critical to how the statistical calculations are performed. It is 'hard-coded' in the program to expect that alive=0 and dead=1. So, that is how you must encode them in your input data.

0
Entering edit mode

Thanks Kevin, you really help me a lot.