Question: question about survival analysis using TCGA dataset
gravatar for tujuchuanli
9 months ago by
tujuchuanli40 wrote:

Hi,all I want to performe survival analysis to predict clinical outcome using genes from my gene set in TCGA data set (maybe I can call my gene set here as gene signature). Since I get my gene set by analyzing newly downloaded TCGA gene expression data, I want to performe survival analysis using the matched clinical data and not prefer to use available online tools (they may miss some important samples).

I read some papers which did the same things. I find that they may add some clinical parameters into survival analysis. For example, this paper ( added the age and tumor_grade into the survival analysis.

Should I add some clinical parameters as they did? or just use expression value?

survival analysis • 575 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by tujuchuanli40

If you are interested in knowing whether any of the clinical parameters might act as confounding variable or have some effect on survival, then you can include them. In anyway you can compare the survival rate between two studies, one with and without including the clinical parameter.

ADD REPLYlink written 9 months ago by pbpanigrahi180

Thanks for answering me. It help me a lot!

ADD REPLYlink written 9 months ago by tujuchuanli40
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

You should only adjust for age and tumour grade in your survival models if you believe that they are important factors to whatever your hypotheses may be. To quote the authors:

We were interested in the effect a gene has on prognosis independent of factors such as tumor grade and age of a patient.

So, they adjusted for age and tumour grade specifically because they were the focus of their study, i.e., they obviously had the belief that age and tumour grade would confound the effect that a gene's expression has on prognosis, which makes sense. They also appear to have included gender in each model, which is not relevant for all cancers, of course, even though, for breast cancer, there are some male breast cancer patients in the TCGA BRCA cohort.

From what I gather, they built an independent Cox proportional hazards model for each gene, and in each case they included age, gender, and tumour grade, but the included covariates varied for different cancers. They then obtained the p-values for each gene and clustered samples using the top 100 genes (Figure 1). The survival curves that appear in Figure 1 are actually just based on the clusters that they identify in this clustering. From the Cox models, they also obtained the Beta coefficients and did further work with these.

To help you, Cox proportional hazards is implemented in R via the coxph() function. I have put some code for doing this already on some Biostars posts:


ADD COMMENTlink written 9 months ago by Kevin Blighe41k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1359 users visited in the last hour