I have a dataset on gene expression of cancer patients, which I performed Cox regression for each gene to find their association with overall survival. I came down to a shortlist of around 1,500 candidates. To my mind, if one could show which genes from that shortlist are associated with each other, and then test whether their combination are still associated with overall survival, that would give more meaning.
So, I thought about linear modeling, where all possible combinations of genes shoud be tested. However, I am a bit stalled with these issues:
- linear models are limited to the amount of variables: which means this test cannot be performed on such a big amount of candidates (i.e.
- in this case, would linear model (or a variant of it) be the method of choice? Which other method would you recommend?;
- given the experience that a lot of you have here, does my rationale on how to handle this dataset make sense at all?
Any help is much appreciated! Thanks.