Question: survival analysis based on gene expression for one gene only
0
gravatar for majd.abdulghani
5 months ago by
majd.abdulghani10 wrote:

Hi,

I have the expression of one gene for 273 glioma patients, as well as their clinical data. I want to do a survival analysis and generate a Kaplan-Meier plot of the patients' survival based on the expression of the gene: "high" or "low". I saw this tutorial on Biostars (Tutorial: Survival analysis with gene expression), and the author takes the Z-score of the expression data to stratify expression as high or low. However, the Z-score is based on the expression of all genes per patient (i.e. taking the average expression and standard deviation of all genes for every patient). Since I don't have the expression of other genes, is it appropriate to take the Z-score for the expression of this gene across all patients (i.e. use the average expression and standard deviation of this gene for all the patients) and stratify high or low expression based on that? Or does survival analysis with gene expression have to be based on the expression of genes per patient, rather than one gene across all patients? I hope this makes sense, please let me know if I need to clarify more.

Thank you!

ADD COMMENTlink modified 5 months ago by Kevin Blighe60k • written 5 months ago by majd.abdulghani10
1
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

HI, I wrote the tutorial.

If you just have one gene, why not instead use tertiles, quartiles, quintiles, sextiles, etc? Also, if you are just testing one gene, then you don't have to use RegParallel, as its designed for quickly testing hundreds or thousands of genes independently.

For a survival model, you can test each gene independently, or create a multivariate model whereby the values of multiple genes are used. One can even include clinical parameters:

~ ATM + EGFR + SmokingStatus + Age

Kevin

ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe60k

Hi Kevin, thank you for responding and for your suggestion (and for writing such a great tutorial). Doing quartiles or something similar is easier indeed, but I want to make sure it's appropriate; I will be dividing into quartiles for the expression of this one gene from all the samples, i.e. 'high' will mean high expression relative to other patients, rather than relative to other genes. Is this a conventional way to do survival analysis?

ADD REPLYlink written 5 months ago by majd.abdulghani10

There is no right or wrong way, really. In my tutorial, I first transform the expression data to Z-scores by row (gene), and then perform the 1st pass analysis using the gene Z-scores on the continuous scale. I then identify key genes from this 1st pass and put those into a new Cox model, but encoded this time as low|mid|high. So, indeed, a gene with a high Z-score has high expression relative to all other genes.

In your case, using quartiles, you can just refer to upper-, mid-, and lower- quartiles, and avoid the use of the word 'high' or 'low', if that helps. Indeed, it would not be high relative to the other genes (well, it may be, but we don't know).

You can easily convert a vector into quartiles like this:

x <- runif(100, 0.0, 100.0)
cut(x,
  breaks = quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1)),
  labels = c('Lwr', 'MidLwr', 'MidUpr', 'Upr'),
  include.lowest = TRUE)
  [1] Lwr    Lwr    Lwr    MidUpr MidLwr MidUpr Upr    MidLwr MidUpr MidUpr
 [11] Upr    Upr    MidLwr Lwr    Upr    MidUpr MidLwr Lwr    Lwr    Upr   
 [21] MidUpr MidUpr Upr    Lwr    MidUpr MidLwr Lwr    Upr    MidUpr MidUpr
 [31] Upr    MidUpr Lwr    MidUpr MidLwr MidUpr MidLwr MidLwr Upr    MidLwr
 [41] MidLwr MidLwr Lwr    Lwr    Upr    Lwr    Upr    MidLwr Upr    MidUpr
 [51] Lwr    Lwr    Lwr    MidLwr Upr    Lwr    Lwr    Lwr    Upr    MidLwr
 [61] MidUpr MidLwr Lwr    Lwr    MidUpr MidLwr MidLwr MidUpr Upr    Upr   
 [71] MidUpr Upr    Upr    MidLwr MidUpr Lwr    MidUpr Upr    MidLwr Upr   
 [81] Lwr    MidUpr Lwr    Upr    MidLwr MidLwr Upr    MidLwr Upr    MidUpr
 [91] MidUpr MidLwr Lwr    Upr    MidUpr Upr    MidLwr MidLwr MidUpr Lwr
ADD REPLYlink written 5 months ago by Kevin Blighe60k
1

Thank you, this is really helpful!

ADD REPLYlink written 5 months ago by majd.abdulghani10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 908 users visited in the last hour