Question: determining cutoff for Kaplan Meier
0
gravatar for XD
3.3 years ago by
XD10
XD10 wrote:

I am analyzing gene X expression in the context of overall survival for a TCGA dataset. I want to take a data driven approach which determines the optimal cutoff for maximum significance between arms (high and low).

Is this approach acceptable and what kind of biases am I working with? I've seen numerous papers with this type of approach for determining cutoffs for KM survival analysis... but I know that there are other options for determining cutoffs such as median or quartile extremes.. or Cox instead of KM (but I really don't consider my circumstance to be a continuous variable).

Also, if I continue with the optimal cutoff... can I do permutation testing to see if it is real? What would my null be to test against... randomized gene expression values while keeping cohort size the same... randomized gene expression values with new optimal cutoffs determined (and allowing cohort size to change)...?

Thanks in advance!

survival kaplan meier • 3.9k views
ADD COMMENTlink modified 23 months ago by Tom_L310 • written 3.3 years ago by XD10

I also have similar questions, I downloaded data from xenophobic browser that hosted TCGA data, when I want to compare high or low expression, it seems difficult to classify.

ADD REPLYlink written 23 months ago by zany198310

This is not an answer to the question. I'm moving it to a comment.

ADD REPLYlink written 23 months ago by Jean-Karim Heriche18k
0
gravatar for SamGG
23 months ago by
SamGG20
France
SamGG20 wrote:

Hi,

I'm not sure I will answer your question. In a first approach, I split the experimental data (gene expression) according the quartiles leading to 3 groups: samples with levels below 25th percentile, higher than 75th percentile and samples in between. From that grouping I get a KM plot and p-value. In a second approach, I use the maxstat package as nicely described at http://r-addict.com/2016/11/21/Optimal-Cutpoint-maxstat.html. IMHO, a relevant cut point must be between the 20th and 80th percentiles if the experiment design is roughly balanced.

HTH

ADD COMMENTlink written 23 months ago by SamGG20

Which is more suitable for TCGA data where you don't really have control over the number of patients that fall into each quartile/cut-point bin?

ADD REPLYlink written 16 months ago by freuv20
0
gravatar for Tom_L
23 months ago by
Tom_L310
Tom_L310 wrote:

Considering gene expression, you should primarily rely on unsupervised approaches such as mean or median split (commonly used). However, I would not recommend the median split since you arbitrary split your cohort in half and I guess that not exactly 50% of patients will survive in your analysis.

Independently of this result, I would recommend to perform a differential expression analysis to see how your gene performs compared to others and ask if there could be a connection between the top differentially expressed genes and yours (same pathway)? Another alternative would be to investigate all genes by survival analysis. Also, I would recommend performing some multivariate analysis with your gene versus other interesting clinical information having a significant impact on patient survivals: tumour grade, size, chemotherapy, radiation therapy, etc.

Depending on your sample size (>200), you can consider generating thousands of random sub-sampling (75/66/50% of you total cohort) and perform similar analysis. How many random trials achieve with a significant survival difference and how bad is the P compared to the total cohort (due, in part, to the loss of statistical power associated to the sub-sampling). This will indicate how robust your expression classification is.

Lastly, you can test multiple (if not all) cut-offs. Is there a significant value and, if yes, why this specific value? Can you subset your cohort based on this value and see if another gene expression or clinical information fits this survival difference?

The approach you describe make sense, you will not find the solution with a single test.

Hope this helps.

Cheers.

ADD COMMENTlink written 23 months ago by Tom_L310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 717 users visited in the last hour