Question: How to do survival analysis?
0
gravatar for wenbinm
2.2 years ago by
wenbinm20
USA
wenbinm20 wrote:

Hi there,

I would like to find genes correlated with poor prognosis. I am doing a simple survival analysis:

  1. divide patients into two groups by gene expression (using median as cutoff).
  2. find genes significantly correlated with overall survival time (using coxph function in R).
  3. check whether my list of genes are up or down regulated in cancer samples compared to normal samples.
  4. finding genes with hazard ratio larger than 1 (low expression group lives longer) that are up regulated in cancer sample and also genes with hazard ratio smaller than 1 that are down in tumors.

Am I doing it right? Is the 4th step necessary? Must the genes with hazard ratio larger than 1 be up regulated in tumor compared to normal tissue (or the hazard ratio won't make any sense)?

Thank you!

ADD COMMENTlink modified 20 months ago by Kevin Blighe66k • written 2.2 years ago by wenbinm20
1
gravatar for Kevin Blighe
20 months ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

With survival analysis using gene expression data, there are many possible ways to do it. Your method seems to be fine, generally.

Just some words of advice: you cannot really just focus on genes with HR greater or less than 1. You also have to accompany these with a statistically significant p value. Usually the log rank p-value is chosen. You also should check the lower and upper confidence intervals (CIs) (at least at 95% confidence level). If you have HR = 1.5, for example, but the lower CI is 0.7, then this will likely not have a statistically significant p-value.

Also, using the word 'up-regulated' from the HRs is not common. Up-regulation and down-regulation are more spoken in the realms of differential expression analysis. With survival, you can just say things like 'the gene's expression results in a higher risk of MyDisease (HR (95% CI): X (Y, Z); p=0.0005)'.

I posted a tutorial that will likely assist you: Survival analysis with gene expression

Kevin

ADD COMMENTlink written 20 months ago by Kevin Blighe66k
2

I agree with the answer but I would just add that you need to take into account multiple testing! If you are testing all genes to see if they correlate with survival you are doing 20k hypothesis tests. You need the probability of finding something "statistically significant" just by chance without a real relationship between the gene and survival is very high. You need to correct for multiple testing to take that into account.

ADD REPLYlink written 20 months ago by bernatgel2.7k
1

That is indeed correct, bernatgel. Thanks! ¬°Gracias!

ADD REPLYlink written 20 months ago by Kevin Blighe66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1956 users visited in the last hour