I am doing a survival analysis using TCGA-BRCA project data. I am trying different cut-offs to separate my samples into high and low risk groups, but since it is my first time I would like to ask a question just to be fully sure that I am on the right track.
When I choose a cut-off based on an algorithm (not median like i've read in most papers) and I divide my patients based on that I get only 6 patients in the High-Risk group and around 900 in Low-Risk group. Using this criteria after I do my KM plot and ROC curve to validate its strength they are both very highly significant. However, I wanted to know that despite these results being so significant, if I should continue down that path or ignore that strategy since one of the groups represents only 6/903 of my patients?
Thank you in advance!!!
See if this tutorial helps: Survival analysis with gene expression
I was having the same query a year back.
I was trying to stratify the patients using hierarchical clustering and I tuned the clustering using different distance calculation and linkage methods. So sometimes there were very few patients in high/low groups. To validate whether a high group has high gene signature expression (low has low expression), I tried plotting it as a heatmap and compared the expression between two groups using a statistical test. So for example, if you have only 6 patients in a high group but in reality, there are few patients with a similar expression profile, which is probably misplaced by your algorithm. Based on this information either you can tune the algorithm (by parameters or some other means) OR maybe you can use a different stratifying strategy.