Identifying genes associated with a binary classification for many cell lines
0
0
Entering edit mode
5.7 years ago
bimlay2 ▴ 30

I have gene-wise expression data for 35 cell lines with ~3 runs per cell line. I also have a binary classification for each cell line associated with a biological phenomenon.

I am interested in finding the genes that are most associated with the binary classification. I have tested several approaches, but I wanted to ask if anyone had insight into these sorts of problems.

So far I have:

  1. Generated univariate AUC scores for each gene, which essentially gives a measure of how separated the binary groups are for each gene.
  2. Used an array of binary classifiers and subsequent variable importance analysis to generate ranked gene importance.

Am I missing an obvious method? Do my approaches so far make sense?

RNA-Seq R • 1.0k views
ADD COMMENT
0
Entering edit mode

You describe that you are interested in finding genes most associated with the binary classification (versus building a predictor of your binary class?). If this is a gene selection question, I would think one alternative would be a differential expression approach: i.e. limma or equivalent with your binary classes as contrast, and rank the genes with largest and/or most significant differences between the two classes.

ADD REPLY
0
Entering edit mode

Thanks for your comment. I actually used DESeq2 to generate DE results. The mean-dispersion trend looked weird, and I got super, super low p-values. I wasn't sure if any DE method was suited for 35 cell lines lumped into two groups.

ADD REPLY
0
Entering edit mode

Ah, OK. If 'biological.phenomenom' is a binary label on each cell line (not an experimental factor that you modulated) then I suppose very confounded with cell.line effects. If cell.line effects are large (probably) but there are still 'biological.phenomenom' main effects that are large enough to observe in that background, then perhaps a rank-based approach like a per-gene univariate Mann-Whitney test comparing the two levels of 'biological.phenomenom' would be worth a try.

ADD REPLY

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6