Question: Identifying genes associated with a binary classification for many cell lines
gravatar for bimlay2
7 days ago by
bimlay20 wrote:

I have gene-wise expression data for 35 cell lines with ~3 runs per cell line. I also have a binary classification for each cell line associated with a biological phenomenon.

I am interested in finding the genes that are most associated with the binary classification. I have tested several approaches, but I wanted to ask if anyone had insight into these sorts of problems.

So far I have:

  1. Generated univariate AUC scores for each gene, which essentially gives a measure of how separated the binary groups are for each gene.
  2. Used an array of binary classifiers and subsequent variable importance analysis to generate ranked gene importance.

Am I missing an obvious method? Do my approaches so far make sense?

rna-seq R • 92 views
ADD COMMENTlink written 7 days ago by bimlay20

You describe that you are interested in finding genes most associated with the binary classification (versus building a predictor of your binary class?). If this is a gene selection question, I would think one alternative would be a differential expression approach: i.e. limma or equivalent with your binary classes as contrast, and rank the genes with largest and/or most significant differences between the two classes.

ADD REPLYlink written 7 days ago by Ahill1.4k

Thanks for your comment. I actually used DESeq2 to generate DE results. The mean-dispersion trend looked weird, and I got super, super low p-values. I wasn't sure if any DE method was suited for 35 cell lines lumped into two groups.

ADD REPLYlink modified 7 days ago • written 7 days ago by bimlay20

Ah, OK. If 'biological.phenomenom' is a binary label on each cell line (not an experimental factor that you modulated) then I suppose very confounded with cell.line effects. If cell.line effects are large (probably) but there are still 'biological.phenomenom' main effects that are large enough to observe in that background, then perhaps a rank-based approach like a per-gene univariate Mann-Whitney test comparing the two levels of 'biological.phenomenom' would be worth a try.

ADD REPLYlink modified 5 days ago • written 5 days ago by Ahill1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2212 users visited in the last hour