Question: Unusual Cox Proportional Hazards Results for Highly Expressed Microarray Probes
0
6.0 years ago by
dario.garvan470
Australia
dario.garvan470 wrote:

I did a survival analysis in two ways. Firstly, I dichotomised the survival data into two groups.

1. Dead from cancer within one year.
2. Alive more than four years, with no sign of relapse.

Then, I used limma to build a linear model and find differentially expressed probes.

Secondly, I used Cox proportional hazards regression. I fitted one model per probe.

`hazardModels <- lapply(1:nrow(expression), function(probe) coxph(survivalData ~ expression[probe, ]))`

I plotted the coefficients from the two methods, to check concordance. Most probes have similar coefficients, and the scatterplot is quite linear. I plotted raw expression values for some probes that were in disagreement. It is surprising that the Cox proportional hazards model is detecting many genes that have no probe expression difference between groups, but are highly expressed. These probes are not detected by the first method. What is the statistical explanation for this ?

modified 5.9 years ago • written 6.0 years ago by dario.garvan470
1

It'd be rather helpful to see an example.

```expression <- c(15.3103311933149, 15.0157174552731, 15.135793474893, 14.9198927859697, 14.9897673751516, 15.3062827467706, 15.3103311933149, 15.2070521612507, 15.059556710088, 15.135793474893, 15.1313866460974, 14.9813325911164, 15.0831610844229, 15.0781787120161, 15.2609923941384, 14.7705795759363, 15.3384294956832, 14.8152507315718, 15.3012062540704, 15.2753063785538, 15.1954229268207, 15.3103311933149, 15.3062827467706, 15.3750848737185, 15.1578260359732, 15.321173185739, 15.2609923941384, 14.9149292674096, 14.9377815644205, 15.0874554648443, 15.1263951394799, 15.1912205682114, 14.0848402429873, 15.3674426187413, 14.7158303376285, 14.8152507315718, 15.1400173563972, 15.4431428726152, 15.2466239892204, 14.9691791905281, 15.3103311933149, 15.3160174597845, 15.1093785260608, 15.2697245998598, 14.8790422727129, 14.7360317386087, 15.3103311933149, 15.0296043828935, 14.7158303376285, 15.3750848737185, 15.0636287648631, 15.2070521612507, 15.1504088368176, 15.0831610844229, 15.3897113254823, 15.1954229268207, 15.0296043828935, 15.1263951394799, 15.1720391013973, 14.9339393180207, 15.221579147451, 15.2855669160713, 15.0874554648443, 15.2171597176676, 15.1175343571932, 15.2559796011832, 15.11361722107, 15.3750848737185)```

```days <- c(3528, 3509, 3289, 443, 1905, 658, 2939, 507, 238, 1326, 1888, 53, 362, 275, 2151, 256, 70, 296, 4628, 2204, 3357, 490, 1913, 2974, 3769, 27, 169, 4357, 43, 2969, 3044, 2885, 226, 2408, 1748, 714, 215, 67, 1200, 377, 3921, 1499, 2115, 176, 3650, 233, 896, 454, 455, 189, 2893, 2932, 241, 193, 1251, 752, 1662, 344, 3298, 2948, 326, 2809, 235, 1570, 201, 1888, 671, 575)```

```status <- c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)```

```library(survival) survivalData <- Surv(days, status) coxph(survivalData ~ expression)```

```library(ggplot2) plotData <- data.frame(years = days / 365,                        expression = expression,                        status = status) ggplot(plotData, aes(x = years, y = expression, colour = status)) + ylim(4, 16) + geom_point()```

do you get similar results for different probes from the same gene?

are there outliers for expression[probe, ] that may adversely affect the survival analysis?

have you log-transformed, or similar, expression[probe, ] and do your histograms for the same look 'normal'?

2
5.9 years ago by
dario.garvan470
Australia
dario.garvan470 wrote:

The explanation is that there is a small change in the ratio of events to survivors as the expression level increases. The trend is exaggerated by the reduced variability of the highly expressed genes being near the maximum limit of measurement.