Survival analysis
1
0
Entering edit mode
3.2 years ago
wenbinm ▴ 40

Hello there,

A package called maxstat can be used to select the "optimal cutoff" in survival analysis based on a biomarker (such as gene expression in TCGA): http://r-addict.com/2016/11/21/Optimal-Cutpoint-maxstat.html. Is this kind of over fitting? Say, given a random gene, choosing an optimal cutoff will increase the likelihood of getting significant result, which increases type 1 error.

This "optimal cutoff" method seems to be useful in classification or prediction task, but not for general test of the clinical significance of a gene.

Cutoff TCGA • 1.4k views
ADD COMMENT
0
Entering edit mode

If you are looking for a solution to overfittng and want to select features for survival analysis, may I suggest the usage of penalized regression. Specifically, MCP from the ncvreg package. It's a robust algorithm(better than LASSO) in choosing variables.

ADD REPLY
2
Entering edit mode
3.2 years ago

If the problem is only of over-fitting, you may do cross-validation (CV). For example, use LOOCV to see if the optimal cut-off changes, thus changing the log-rank test result of Kaplan-Meier (KM). But in practice, KM for continuous predictors is done by comparing only two groups of highest vs. lowest quartiles., thus avoiding arbitrarily cutting in the middle where the data is move overlapping between the two groups.

Ultimately, you can also do cross-validation of the KM or Cox PH model itself to be more sure that the model is not extremely overfitting.

ADD COMMENT

Login before adding your answer.

Traffic: 2232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6