Question: Unusual Cox Proportional Hazards Results for Highly Expressed Microarray Probes
0
gravatar for dario.garvan
5.2 years ago by
dario.garvan460
Australia
dario.garvan460 wrote:

I did a survival analysis in two ways. Firstly, I dichotomised the survival data into two groups.

  1. Dead from cancer within one year.
  2. Alive more than four years, with no sign of relapse.

Then, I used limma to build a linear model and find differentially expressed probes.

Secondly, I used Cox proportional hazards regression. I fitted one model per probe.

hazardModels <- lapply(1:nrow(expression), function(probe) coxph(survivalData ~ expression[probe, ]))

I plotted the coefficients from the two methods, to check concordance. Most probes have similar coefficients, and the scatterplot is quite linear. I plotted raw expression values for some probes that were in disagreement. It is surprising that the Cox proportional hazards model is detecting many genes that have no probe expression difference between groups, but are highly expressed. These probes are not detected by the first method. What is the statistical explanation for this ?

ADD COMMENTlink modified 5.1 years ago • written 5.2 years ago by dario.garvan460
1

It'd be rather helpful to see an example.

ADD REPLYlink written 5.2 years ago by Devon Ryan92k

expression <- c(15.3103311933149, 15.0157174552731, 15.135793474893,
14.9198927859697, 14.9897673751516, 15.3062827467706, 15.3103311933149,
15.2070521612507, 15.059556710088, 15.135793474893, 15.1313866460974,
14.9813325911164, 15.0831610844229, 15.0781787120161, 15.2609923941384,
14.7705795759363, 15.3384294956832, 14.8152507315718, 15.3012062540704,
15.2753063785538, 15.1954229268207, 15.3103311933149, 15.3062827467706,
15.3750848737185, 15.1578260359732, 15.321173185739, 15.2609923941384,
14.9149292674096, 14.9377815644205, 15.0874554648443, 15.1263951394799,
15.1912205682114, 14.0848402429873, 15.3674426187413, 14.7158303376285,
14.8152507315718, 15.1400173563972, 15.4431428726152, 15.2466239892204,
14.9691791905281, 15.3103311933149, 15.3160174597845, 15.1093785260608,
15.2697245998598, 14.8790422727129, 14.7360317386087, 15.3103311933149,
15.0296043828935, 14.7158303376285, 15.3750848737185, 15.0636287648631,
15.2070521612507, 15.1504088368176, 15.0831610844229, 15.3897113254823,
15.1954229268207, 15.0296043828935, 15.1263951394799, 15.1720391013973,
14.9339393180207, 15.221579147451, 15.2855669160713, 15.0874554648443,
15.2171597176676, 15.1175343571932, 15.2559796011832, 15.11361722107,
15.3750848737185)

days <- c(3528, 3509, 3289, 443, 1905, 658, 2939, 507, 238,
1326, 1888, 53, 362, 275, 2151, 256, 70, 296, 4628,
2204, 3357, 490, 1913, 2974, 3769, 27, 169, 4357, 43,
2969, 3044, 2885, 226, 2408, 1748, 714, 215, 67, 1200,
377, 3921, 1499, 2115, 176, 3650, 233, 896, 454, 455,
189, 2893, 2932, 241, 193, 1251, 752, 1662, 344, 3298,
2948, 326, 2809, 235, 1570, 201, 1888, 671, 575)

status <- c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE,
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE,
TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE)

library(survival)
survivalData <- Surv(days, status)
coxph(survivalData ~ expression)

library(ggplot2)
plotData <- data.frame(years = days / 365,
                       expression = expression,
                       status = status)
ggplot(plotData, aes(x = years, y = expression, colour = status)) + ylim(4, 16) + geom_point()

ADD REPLYlink written 5.2 years ago by dario.garvan460

do you get similar results for different probes from the same gene?

are there outliers for expression[probe, ] that may adversely affect the survival analysis?

have you log-transformed, or similar, expression[probe, ] and do your histograms for the same look 'normal'?

ADD REPLYlink written 5.2 years ago by russhh4.7k
2
gravatar for dario.garvan
5.1 years ago by
dario.garvan460
Australia
dario.garvan460 wrote:

The explanation is that there is a small change in the ratio of events to survivors as the expression level increases. The trend is exaggerated by the reduced variability of the highly expressed genes being near the maximum limit of measurement.

ADD COMMENTlink written 5.1 years ago by dario.garvan460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour