I am working with a cancer mouse model that produced tumors, and we have performed gene expression profiling on all of them. I would be interested in building a classifier to identify human tumors, based on their gene expression, that are similar to my model (i.e. "mouse-like"). The microarrays have 27000+ features. I suspect that I don't need as many features. Hence, I was wondering if there were a methodology to pick the best number/nature of parameters? I know that it is counter-intuitive because I shouldn't look at the data before I apply machine learning. I am currently reading papers.
Thank you for your input!