Question

How to select the optimal genes for neural network analysis?

0

Entering edit mode

9.5 years ago

Avro ▴ 160

Hi everyone!

I am working with a cancer mouse model that produced tumors, and we have performed gene expression profiling on all of them. I would be interested in building a classifier to identify human tumors, based on their gene expression, that are similar to my model (i.e. "mouse-like"). The microarrays have 27000+ features. I suspect that I don't need as many features. Hence, I was wondering if there were a methodology to pick the best number/nature of parameters? I know that it is counter-intuitive because I shouldn't look at the data before I apply machine learning. I am currently reading papers.

Thank you for your input!

gene neural-network • 2.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Avro ▴ 160

0

Entering edit mode

It IS safe to filter genes to those with high variance; this would be a quick and easy way to get a reasonable set for classification.

ADD REPLY • link 9.5 years ago by Sean Davis 26k

0

Entering edit mode

Hi! Thank you for your quick response. Could I use a nonparamteric ranking test (e.g. Wilcoxon) to get the genes with the highest variance?

ADD REPLY • link 9.5 years ago by Avro ▴ 160

0

Entering edit mode

No. You may not use any measure of variability that includes the classes.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Sean Davis 26k

0

Entering edit mode

Thank you! Is there a way to have a cutoff for the variance? I am asking because the variance values will be continous. Bootstrap resampling?

ADD REPLY • link 9.5 years ago by Avro ▴ 160

0

Entering edit mode

There is no "cutoff". I suspect that you'll find that there is a pretty broad range that can result in similar performance.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Sean Davis 26k

Ram · Answer 1 · 2014-11-22

0

Entering edit mode

9.5 years ago

jgbradley1 ▴ 110

Have you considered reducing the number of features by doing some kind of PCA (Principal Component Analysis) approach?

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by jgbradley1 ▴ 110

0

Entering edit mode

Hi! Thank you for your quick response. Yes. PCA reduces dimensionality while keeping the variability, right? I am reading more about it. By reducing the dimensionality, would I remove gene dimensions or would I create parameters (no gene loci, e.g. z1, z2,...)? Sorry if I am confused.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Avro ▴ 160