I have 200 patients with whole exome sequence and I want to build a multivariate Cox regression of overall survival with the gene mutations. First of all ,I should choose the mutational genes for the next modeling , while I don't know how to set threshold of mutation frequency to choose them. It is said that 1% is the putative threshold above which mutation is considered as 'high frequency mutation'. But if I choose 1%, it means certain gene is mutated within 2 patients and 198 patients without this mutation. In this case, is the gene statistically appropriate for Cox regression analysis?
Thus,I hope someone could give me some advice on the mutation frequency threshold setting or articles which deal with similar situation. Thank you!