survival analysis based on gene expression for one gene only
1
0
Entering edit mode
22 months ago

Hi,

I have the expression of one gene for 273 glioma patients, as well as their clinical data. I want to do a survival analysis and generate a Kaplan-Meier plot of the patients' survival based on the expression of the gene: "high" or "low". I saw this tutorial on Biostars (Tutorial: Survival analysis with gene expression), and the author takes the Z-score of the expression data to stratify expression as high or low. However, the Z-score is based on the expression of all genes per patient (i.e. taking the average expression and standard deviation of all genes for every patient). Since I don't have the expression of other genes, is it appropriate to take the Z-score for the expression of this gene across all patients (i.e. use the average expression and standard deviation of this gene for all the patients) and stratify high or low expression based on that? Or does survival analysis with gene expression have to be based on the expression of genes per patient, rather than one gene across all patients? I hope this makes sense, please let me know if I need to clarify more.

Thank you!

survival analysis cox gene expression • 773 views
1
Entering edit mode
22 months ago

HI, I wrote the tutorial.

If you just have one gene, why not instead use tertiles, quartiles, quintiles, sextiles, etc? Also, if you are just testing one gene, then you don't have to use RegParallel, as its designed for quickly testing hundreds or thousands of genes independently.

For a survival model, you can test each gene independently, or create a multivariate model whereby the values of multiple genes are used. One can even include clinical parameters:

~ ATM + EGFR + SmokingStatus + Age


Kevin

0
Entering edit mode

Hi Kevin, thank you for responding and for your suggestion (and for writing such a great tutorial). Doing quartiles or something similar is easier indeed, but I want to make sure it's appropriate; I will be dividing into quartiles for the expression of this one gene from all the samples, i.e. 'high' will mean high expression relative to other patients, rather than relative to other genes. Is this a conventional way to do survival analysis?

0
Entering edit mode

There is no right or wrong way, really. In my tutorial, I first transform the expression data to Z-scores by row (gene), and then perform the 1st pass analysis using the gene Z-scores on the continuous scale. I then identify key genes from this 1st pass and put those into a new Cox model, but encoded this time as low|mid|high. So, indeed, a gene with a high Z-score has high expression relative to all other genes.

In your case, using quartiles, you can just refer to upper-, mid-, and lower- quartiles, and avoid the use of the word 'high' or 'low', if that helps. Indeed, it would not be high relative to the other genes (well, it may be, but we don't know).

You can easily convert a vector into quartiles like this:

x <- runif(100, 0.0, 100.0)
cut(x,
breaks = quantile(x, probs = c(0, 0.25, 0.5, 0.75, 1)),
labels = c('Lwr', 'MidLwr', 'MidUpr', 'Upr'),
include.lowest = TRUE)
[1] Lwr    Lwr    Lwr    MidUpr MidLwr MidUpr Upr    MidLwr MidUpr MidUpr
[11] Upr    Upr    MidLwr Lwr    Upr    MidUpr MidLwr Lwr    Lwr    Upr
[21] MidUpr MidUpr Upr    Lwr    MidUpr MidLwr Lwr    Upr    MidUpr MidUpr
[31] Upr    MidUpr Lwr    MidUpr MidLwr MidUpr MidLwr MidLwr Upr    MidLwr
[41] MidLwr MidLwr Lwr    Lwr    Upr    Lwr    Upr    MidLwr Upr    MidUpr
[51] Lwr    Lwr    Lwr    MidLwr Upr    Lwr    Lwr    Lwr    Upr    MidLwr
[61] MidUpr MidLwr Lwr    Lwr    MidUpr MidLwr MidLwr MidUpr Upr    Upr
[71] MidUpr Upr    Upr    MidLwr MidUpr Lwr    MidUpr Upr    MidLwr Upr
[81] Lwr    MidUpr Lwr    Upr    MidLwr MidLwr Upr    MidLwr Upr    MidUpr
[91] MidUpr MidLwr Lwr    Upr    MidUpr Upr    MidLwr MidLwr MidUpr Lwr

1
Entering edit mode

Thank you, this is really helpful!