I'm currently working on survival analyses using different cohorts. I found a striking evidence of relationship between cohort size and survival difference (the larger cohort size, the better survival differences). This sounds logical to me since a big cohort allows more classification error compared to a small one were a single misclassification would be terrible.
As an example, here is a short R script that performs survival analysis in a full cohort (size=100) and then do the same thing on 1000 sub-cohorts (sizes=90) derived from the first one.
mySurvival=function(DF) return(pchisq(survdiff(Surv(time, censor) ~ drug,data=DF)$chisq,1,lower.tail=FALSE)) hmohiv=read.table("http://www.ats.ucla.edu/stat/r/examples/asa/hmohiv.csv", sep=",", header = TRUE) REFERENCE_P=mySurvival(hmohiv) TESTED_P=sapply(1:1000,function(x) mySurvival(hmohiv[sample(1:100,90),])) boxplot(TESTED_P) abline(h=REFERENCE_P)
In my run (takes 5 seconds), ~75% of the randomy generated sub-cohorts have higher P than the reference P.
My hypothesis is that several of my cohorts, which do not have a survival difference at P<0.05, could become significant if they were larger (median size is ~110, from ~50 to ~550). To support my hypothesis (unless my hypothesis is wrong so please correct me), I'm looking for a paper that I can cite (ex: PubMed), but I did not found any. The point is if I search "sample size survival" or "cohort size survival", it quickly switch to a kind of "tumor size survival" query and I get tons of non-related results. Any well-known work I should be aware of?