This may seem like a simple statistical question, but here goes: I have an algorithm which takes cancer patient data (gene expression and survival data) and comes up with a small subgroup of patients that have poorer prognosis (Kaplan-Meier survival p-value around 0.001) based on a signature generated. Currently the method used 250 patients, and between 25 and 50 of them are in the poor-prognosis group to give us this p-value. I am trying to find the smallest number of patients that this method will work on. For instance, if I have 50 patients and 5 are selected for the poor prognosis subgroup, what is the confidence that I have (and how can I compute this confidence) that the method is working (and to what significance)? For instance, are 50 patients enough to have the confidence that there is 1% chance that this arrangement could have happened by chance (p=0.01)? And how would I calculate the number of patients necessary to have confidence levels of 0.1, 0.05, 0.01, etc. I am trying to find the best statistical simulation to achieve this and feel that I am overlooking something simple. Any help would be greatly appreciated. Thanks!
In any power calculation exercise you need to estimate the effect size. The general form of a power calculation is, "if the effect has a certain strength (e.g., it changes the mean value my phenotype for treated subjects compared to control subjects two by standard deviations from the untreated group) and I wish to have a certain probability of detecting the effect should it truly exist (e.g. I wish to have 80% power to detect an effect), then spit out how many samples I require. In theory if you have a wonderful signature that generates huge effects (meaning big separations in your KM curves) then you probably need few samples, provided they are balanced enough to contain both classes. In the real world this is rarely the case for a non-trivial problem. In KM analysis I think you also need to ask, what follow-up time is required?
There's a lot to think about here. If you have preliminary data then you could estimate the number of patients required by sampling from within your population using a range of sample sizes. However, it sounds like your overall method uses data about the patients in order to generate a signature, separates the patients by this signature, and then calculates survival curves. You should be careful about overfitting your training population here if you want to make a general statement about how many patients are required. There are non-parametric re-sampling approaches you could take. You could cross-validate the method using randomly selected subsets of differing sizes (50, 100, 150, 200, 250) to give an idea of whether and at what point the gene selection is stable, where you start getting reproducible differences in classes (look at the distribution of P values for 1000 runs at each sample size), and whether patient assignment is stable (e.g. is patient X is always called "good outcome" in a classifier built using 200 patients but randomly assigned using 50 patients?).
Any time you have a question like this it's a good idea to ask a real statistician. A brief look into the literature (google "power calculations kaplan meier") will turn up primers: