Question

Survival analysis for a list of genes using TCGA data

0

Entering edit mode

5.2 years ago

Mike ★ 1.9k

Hi all,

I have a list of genes (around 300 genes) and I want survival analysis to find only significant genes. I am using TCGA RSEM normalized data in survival package using following command, but I'm not sure how to choose significant genes.

coxph(Surv(OS_MONTHS, Events) ~., data=merged_data [, c(2, 4:303)])

output...

                              coef    exp(coef)           se(coef)    z    p
Gene1                0.07092   1.07349  0.05348 1.33 0.18
Gene2               0.02332   1.02360  0.05813 0.40 0.69
Gene3               0.00175   1.00175  0.06734 0.03 0.98
Gene n.....

Thanks

Survival coxph TCGA Survival analysis • 4.1k views

ADD COMMENT • link updated 2.6 years ago by huynguyen96.dnu ▴ 50 • written 5.2 years ago by Mike ★ 1.9k

4

Entering edit mode

Hey Mike, usually, one takes the log rank p-value. The Hazard Ratio is specified by exp(coef). Have you included all of your genes in the model? It may be better to test each independently and then, perhaps, construct a final model with just those genes that are statistically significant. I, of course, have a tutorial: Survival analysis with gene expression

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks Kevin, as always your help and codes are really very helpful. Yes I included all genes. you can see my comand:

coxph(Surv(OS_MONTHS, Events) ~., data=merged_data [, c(2, 4:303)])

Is it correct? Sorry for again.. , what do you mean by test each independently and then construct a final model with selected genes? meantime I am installing your package RegParallel and will follow your protocol.

ADD REPLY • link 5.2 years ago by Mike ★ 1.9k

1

Entering edit mode

Okay. If you use Windows, then select a low number of cores with RegParallel. It runs on Windows but more efficient on Linux / Mac. The survival part is also in the vignette: https://github.com/kevinblighe/RegParallel#survival-analysis-via-cox-proportional-hazards-regression

There is a difference between testing each gene independently and including them all in the same model from the beginning. When your formula is this:

Surv(OS_MONTHS, Events) ~ gene1 + gene2 + gene3)

...then, the model interprets this as an additive effect between the 3 genes. The genes are 'adjusting' for each other's effects (for a better description, could talk to a statistician). The p-values that you get will differ from when you do:

Surv(OS_MONTHS, Events) ~ gene1)
Surv(OS_MONTHS, Events) ~ gene2)
Surv(OS_MONTHS, Events) ~ gene3)

The RegParallel code will test each variable independently for you and then piece the results together into a single table. This is effectively the same as what EdgeR, DESeq2, and limma are doing, too, i.e., they test each gene independently.

Once you have identified some key genes, then you could still just report them as independently statistically significantly associated with survival. Note the difference in the survival curve when you plot an independent model versus the additive model.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you very much, for RegParallel I need to upgrade my R (I have R version 3.3.3).

ADD REPLY • link 5.2 years ago by Mike ★ 1.9k

0

Entering edit mode

Yes, this is a requirement from Bioconductor. When a package is being developed, it has to use the latest version. RegParallel was only accepted last month, so, it requires R 3.5. If you need to definitively use it, just obtain the development version direct from GitHub.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

1

Entering edit mode

I have now removed the R version requirement for this in the development version. You can install it with:

devtools::install_github('kevinblighe/RegParallel')

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

0

Entering edit mode

It works perfectly, thank you very much!

ADD REPLY • link 5.2 years ago by Mike ★ 1.9k

0

Entering edit mode

In doing this, it would be a good idea to update all of your current packages with:

source("https://bioconductor.org/biocLite.R")
biocLite()

or

BiocManager::install()

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

score 0 · Answer 1 · 2021-08-27

0

Entering edit mode

2.6 years ago

huynguyen96.dnu ▴ 50

Hi! now you can do this task very easily with my recently developed tool named 'geneSA' (https://github.com/huynguyen250896/geneSA). Its output will automatically report genes statistically significant with survival outcome. Give it a try ;)

ADD COMMENT • link 2.6 years ago by huynguyen96.dnu ▴ 50