I did a survival analysis in two ways. Firstly, I dichotomised the survival data into two groups.
- Dead from cancer within one year.
- Alive more than four years, with no sign of relapse.
Then, I used limma to build a linear model and find differentially expressed probes.
Secondly, I used Cox proportional hazards regression. I fitted one model per probe.
hazardModels <- lapply(1:nrow(expression), function(probe) coxph(survivalData ~ expression[probe, ]))
I plotted the coefficients from the two methods, to check concordance. Most probes have similar coefficients, and the scatterplot is quite linear. I plotted raw expression values for some probes that were in disagreement. It is surprising that the Cox proportional hazards model is detecting many genes that have no probe expression difference between groups, but are highly expressed. These probes are not detected by the first method. What is the statistical explanation for this ?