Question: Survival analysis for a list of genes using TCGA data
0
gravatar for Mike
6 months ago by
Mike1.3k
UK
Mike1.3k wrote:

Hi all,

I have a list of genes (around 300 genes) and I want survival analysis to find only significant genes. I am using TCGA RSEM normalized data in survival package using following command, but I'm not sure how to choose significant genes.

coxph(Surv(OS_MONTHS, Events) ~., data=merged_data [, c(2, 4:303)])

output...

                              coef    exp(coef)           se(coef)    z    p
Gene1                0.07092   1.07349  0.05348 1.33 0.18
Gene2               0.02332   1.02360  0.05813 0.40 0.69
Gene3               0.00175   1.00175  0.06734 0.03 0.98
Gene n.....

Thanks

ADD COMMENTlink written 6 months ago by Mike1.3k
2

Hey Mike, usually, one takes the log rank p-value. The Hazard Ratio is specified by exp(coef). Have you included all of your genes in the model? It may be better to test each independently and then, perhaps, construct a final model with just those genes that are statistically significant. I, of course, have a tutorial: Survival analysis with gene expression

ADD REPLYlink written 6 months ago by Kevin Blighe46k

Thanks Kevin, as always your help and codes are really very helpful. Yes I included all genes. you can see my comand:

coxph(Surv(OS_MONTHS, Events) ~., data=merged_data [, c(2, 4:303)])

Is it correct? Sorry for again.. , what do you mean by test each independently and then construct a final model with selected genes? meantime I am installing your package RegParallel and will follow your protocol.

ADD REPLYlink written 6 months ago by Mike1.3k
1

Okay. If you use Windows, then select a low number of cores with RegParallel. It runs on Windows but more efficient on Linux / Mac. The survival part is also in the vignette: https://github.com/kevinblighe/RegParallel#survival-analysis-via-cox-proportional-hazards-regression

There is a difference between testing each gene independently and including them all in the same model from the beginning. When your formula is this:

Surv(OS_MONTHS, Events) ~ gene1 + gene2 + gene3)

...then, the model interprets this as an additive effect between the 3 genes. The genes are 'adjusting' for each other's effects (for a better description, could talk to a statistician). The p-values that you get will differ from when you do:

Surv(OS_MONTHS, Events) ~ gene1)
Surv(OS_MONTHS, Events) ~ gene2)
Surv(OS_MONTHS, Events) ~ gene3)

The RegParallel code will test each variable independently for you and then piece the results together into a single table. This is effectively the same as what EdgeR, DESeq2, and limma are doing, too, i.e., they test each gene independently.

Once you have identified some key genes, then you could still just report them as independently statistically significantly associated with survival. Note the difference in the survival curve when you plot an independent model versus the additive model.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe46k

Thank you very much, for RegParallel I need to upgrade my R (I have R version 3.3.3).

ADD REPLYlink written 6 months ago by Mike1.3k

Yes, this is a requirement from Bioconductor. When a package is being developed, it has to use the latest version. RegParallel was only accepted last month, so, it requires R 3.5. If you need to definitively use it, just obtain the development version direct from GitHub.

ADD REPLYlink written 6 months ago by Kevin Blighe46k
1

I have now removed the R version requirement for this in the development version. You can install it with:

devtools::install_github('kevinblighe/RegParallel')
ADD REPLYlink written 6 months ago by Kevin Blighe46k

It works perfectly, thank you very much!

ADD REPLYlink written 6 months ago by Mike1.3k

In doing this, it would be a good idea to update all of your current packages with:

source("https://bioconductor.org/biocLite.R")
biocLite()

or

BiocManager::install()
ADD REPLYlink written 6 months ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour