1
0
Entering edit mode
9 weeks ago
Sarah ▴ 30

Hello,

I am doing a survival analysis to identify prognostic genes in bladder cancer. I have selected 15 genes by the likelihood method with the package rbsurv. Now I want to do a multivariate cox analysis to introduce cancer stage , histology etc... I don't know how I can use genes expression selected in the multivariate analysis

I need any explanations/suggestions

statistics bladder analysis survival cancer • 308 views
1
Entering edit mode
9 weeks ago
LChart 1.4k

If you have lots of samples, you can include gene expression directly in the cox regression via e.g., ~ sex + age + stage + gene1_expr + gene2_expr + .... (in survminer or other package) Technically you only need 1 more sample than total number of variables, but for 15 total genes I would hesitate to take this approach for anything less than 100 samples.

Another problem is that many clinicians find it difficult to interpret survival data with continuous independent variables; it's a lot easier to discretize the expressions into 3 bins (high expression, low expression, background) -- usually 20%/60%/20% or 25%/50%/25% to plot survival curves. Taking this approach would expand 15 quantitative variables to 45 binary variables; but in this way you can directly look for expression bins that are prognostic.

The most typical approach is to define an "expression score" from all of the selected genes. This is basically taking the coefficients you got from rbsurv and using them to form a weighted average of expression to generate a single score. You can then cut that score into tertiles or quartiles to compare survival curves.

However, because you have used rbsurv to select these genes, you must ignore all of the statistics generated from these regressions (unless you are using a new set of samples) as they will be miscalibrated. To obtain calibrated statistics from this approach, you will need to use a permutation approach, followed by rbsurv selection and then the post-hoc regression. How to permute appropriately in the presence of clinical covariates is tricky and worth its own post.

0
Entering edit mode

I have 500 samples, and for each gene we have its expression in the 500 samples I don't see how to use the expression as a parameter, how can i organize this ? For the moment I test with this command :

coxph(Surv(daysToDeath_cox, vitalStatus_cox) ~  DEG + stage + gender + race)


DEG is all the expression values of each gene in the 500 samples.

how I can use the parameters of rbsurv in my analysis ?

0
Entering edit mode

To use the expression values, of the 15 genes you would take your clinical data frame and do something like

for ( gene_id in selected_ids ) {
clinical.data.frame[,gene_id] <- DEG[gene_id,]
}


assuming the rows of the clinical data have the same orders as the columns of DEG. Then you can put them directly in a formula.

You should be able to pull out the coefficients from the result of coxph; maybe out of summary()\$coefficients ?

0
Entering edit mode

But I have the gene expression for each sample so multiple expressions values for one gene

0
Entering edit mode

You have stage, gender, and race for each sample too -- so multiple values for these variables too -- so what's the problem?