Unusual Cox Proportional Hazards Results for Highly Expressed Microarray Probes
1
0
Entering edit mode
9.8 years ago
dario.garvan ▴ 520

I did a survival analysis in two ways. Firstly, I dichotomised the survival data into two groups.

  1. Dead from cancer within one year.
  2. Alive more than four years, with no sign of relapse.

Then, I used limma to build a linear model and find differentially expressed probes.

Secondly, I used Cox proportional hazards regression. I fitted one model per probe.

hazardModels <- lapply(1:nrow(expression), function(probe) coxph(survivalData ~ expression[probe, ]))

I plotted the coefficients from the two methods, to check concordance. Most probes have similar coefficients, and the scatterplot is quite linear. I plotted raw expression values for some probes that were in disagreement. It is surprising that the Cox proportional hazards model is detecting many genes that have no probe expression difference between groups, but are highly expressed. These probes are not detected by the first method. What is the statistical explanation for this ?

microarray survival proportional hazards • 3.1k views
ADD COMMENT
1
Entering edit mode

It'd be rather helpful to see an example.

ADD REPLY
0
Entering edit mode
expression <- c(15.3103311933149, 15.0157174552731, 15.135793474893,
14.9198927859697, 14.9897673751516, 15.3062827467706, 15.3103311933149,
15.2070521612507, 15.059556710088, 15.135793474893, 15.1313866460974,
14.9813325911164, 15.0831610844229, 15.0781787120161, 15.2609923941384,
14.7705795759363, 15.3384294956832, 14.8152507315718, 15.3012062540704,
15.2753063785538, 15.1954229268207, 15.3103311933149, 15.3062827467706,
15.3750848737185, 15.1578260359732, 15.321173185739, 15.2609923941384,
14.9149292674096, 14.9377815644205, 15.0874554648443, 15.1263951394799,
15.1912205682114, 14.0848402429873, 15.3674426187413, 14.7158303376285,
14.8152507315718, 15.1400173563972, 15.4431428726152, 15.2466239892204,
14.9691791905281, 15.3103311933149, 15.3160174597845, 15.1093785260608,
15.2697245998598, 14.8790422727129, 14.7360317386087, 15.3103311933149,
15.0296043828935, 14.7158303376285, 15.3750848737185, 15.0636287648631,
15.2070521612507, 15.1504088368176, 15.0831610844229, 15.3897113254823,
15.1954229268207, 15.0296043828935, 15.1263951394799, 15.1720391013973,
14.9339393180207, 15.221579147451, 15.2855669160713, 15.0874554648443,
15.2171597176676, 15.1175343571932, 15.2559796011832, 15.11361722107,
15.3750848737185)

days <- c(3528, 3509, 3289, 443, 1905, 658, 2939, 507, 238,
1326, 1888, 53, 362, 275, 2151, 256, 70, 296, 4628,
2204, 3357, 490, 1913, 2974, 3769, 27, 169, 4357, 43,
2969, 3044, 2885, 226, 2408, 1748, 714, 215, 67, 1200,
377, 3921, 1499, 2115, 176, 3650, 233, 896, 454, 455,
189, 2893, 2932, 241, 193, 1251, 752, 1662, 344, 3298,
2948, 326, 2809, 235, 1570, 201, 1888, 671, 575)

status <- c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)

library(survival)
survivalData <- Surv(days, status)
coxph(survivalData ~ expression)

library(ggplot2)
plotData <- data.frame(years = days / 365,
                       expression = expression,
                       status = status)
ggplot(plotData, aes(x = years, y = expression, colour = status)) + ylim(4, 16) + geom_point()
ADD REPLY
0
Entering edit mode

do you get similar results for different probes from the same gene?

are there outliers for expression[probe, ] that may adversely affect the survival analysis?

have you log-transformed, or similar, expression[probe, ] and do your histograms for the same look 'normal'?

ADD REPLY
2
Entering edit mode
9.7 years ago
dario.garvan ▴ 520

The explanation is that there is a small change in the ratio of events to survivors as the expression level increases. The trend is exaggerated by the reduced variability of the highly expressed genes being near the maximum limit of measurement.

ADD COMMENT

Login before adding your answer.

Traffic: 1246 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6