Question: question about survival analysis using TCGA dataset
gravatar for tujuchuanli
2.4 years ago by
tujuchuanli90 wrote:

Hi,all I want to performe survival analysis to predict clinical outcome using genes from my gene set in TCGA data set (maybe I can call my gene set here as gene signature). Since I get my gene set by analyzing newly downloaded TCGA gene expression data, I want to performe survival analysis using the matched clinical data and not prefer to use available online tools (they may miss some important samples).

I read some papers which did the same things. I find that they may add some clinical parameters into survival analysis. For example, this paper ( added the age and tumor_grade into the survival analysis.

Should I add some clinical parameters as they did? or just use expression value?

survival analysis • 1.4k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by tujuchuanli90

If you are interested in knowing whether any of the clinical parameters might act as confounding variable or have some effect on survival, then you can include them. In anyway you can compare the survival rate between two studies, one with and without including the clinical parameter.

ADD REPLYlink written 2.4 years ago by pbpanigrahi190

Thanks for answering me. It help me a lot!

ADD REPLYlink written 2.4 years ago by tujuchuanli90
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

You should only adjust for age and tumour grade in your survival models if you believe that they are important factors to whatever your hypotheses may be. To quote the authors:

We were interested in the effect a gene has on prognosis independent of factors such as tumor grade and age of a patient.

So, they adjusted for age and tumour grade specifically because they were the focus of their study, i.e., they obviously had the belief that age and tumour grade would confound the effect that a gene's expression has on prognosis, which makes sense. They also appear to have included gender in each model, which is not relevant for all cancers, of course, even though, for breast cancer, there are some male breast cancer patients in the TCGA BRCA cohort.

From what I gather, they built an independent Cox proportional hazards model for each gene, and in each case they included age, gender, and tumour grade, but the included covariates varied for different cancers. They then obtained the p-values for each gene and clustered samples using the top 100 genes (Figure 1). The survival curves that appear in Figure 1 are actually just based on the clusters that they identify in this clustering. From the Cox models, they also obtained the Beta coefficients and did further work with these.

To help you, Cox proportional hazards is implemented in R via the coxph() function. I have put some code for doing this already on some Biostars posts:


ADD COMMENTlink written 2.4 years ago by Kevin Blighe67k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour