Survival analysis of several genes with more than 3 levels
Entering edit mode
4 months ago
antmantras ▴ 10

Hi all.

I want to perform a survival analysis with data coming from TCGA (RSEM format). Specifically, I want to analyze certain gene sets with several discretized expression levels. For example a gene set could be: {gen_a = "Very_low", gen_b = "Very_low", gen_c = "High"}. The expression levels of the genes have been divided into five intervals corresponding to an equal frequency discretization, so that the genes with the lowest expression are labeled as "Very_low", while the other labels correspond to higher expression levels: "Low", "Mid", "High" or "Very_high".

The purpose of the analysis is to find (if any) differences in the survival rate of patients affected by different molecular subtypes of cancer. For example, for breast cancer, I would like to identify whether the genes in gene set X {gen_a = "Very_low", gen_b = "Very_low", gen_c = "High"} significantly influence the survival of patients with different molecular subtypes.

The gene sets have been obtained in such a way as to maximize the differences between molecular subtypes, i.e., if we are exploring molecular subtype A, the gene sets obtained for A have a high frequency (occurrence rate) in samples of molecular subtype A, whereas they appear few or zero times in the other subtypes. For this reason, it is possible that the gene set X mentioned above may be in molecular subtype A but have no representation in the other subtypes. For example, this could be the case:

Subtype A = {gen_a = "Very_low", gen_b = "Very_low", gen_c = "High"} 80/100 samples.

Subtype B = {gen_a = "Very_low", gen_b = "Very_low", gen_c = "High"} 0/100 samples, instead the pattern that appears with those genes with higher frequency is {gen_a = "Very_low", gen_b = "High", gen_c = "High"} 70/100 samples

Subtype C = {gen_a = "Very_low", gen_b = "Very_low", gen_c = "High"} 0/100 samples, instead the pattern that appears with those genes with higher frequency is {gen_a = "Very_low", gen_b = "Very_high", gen_c = "High"} 35/100 samples

In this case, could a survival analysis be done comparing the set of genes X, proper of subtype A, with the set that includes the same genes but with different expression levels (and that has the highest frequency) in the rest of the molecular subtypes? Thanks in advance.

Edit: I have read this tutorial on Survival analysis with gene expression. However I would like to know if this is suitable for my data.

tcga cancer survival • 124 views

Login before adding your answer.

Traffic: 2322 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6