I have gene-wise expression data for 35 cell lines with ~3 runs per cell line. I also have a binary classification for each cell line associated with a biological phenomenon.
I am interested in finding the genes that are most associated with the binary classification. I have tested several approaches, but I wanted to ask if anyone had insight into these sorts of problems.
So far I have:
- Generated univariate AUC scores for each gene, which essentially gives a measure of how separated the binary groups are for each gene.
- Used an array of binary classifiers and subsequent variable importance analysis to generate ranked gene importance.
Am I missing an obvious method? Do my approaches so far make sense?