Question: Identifying genes associated with a binary classification for many cell lines
0
gravatar for bimlay2
8 months ago by
bimlay230
bimlay230 wrote:

I have gene-wise expression data for 35 cell lines with ~3 runs per cell line. I also have a binary classification for each cell line associated with a biological phenomenon.

I am interested in finding the genes that are most associated with the binary classification. I have tested several approaches, but I wanted to ask if anyone had insight into these sorts of problems.

So far I have:

  1. Generated univariate AUC scores for each gene, which essentially gives a measure of how separated the binary groups are for each gene.
  2. Used an array of binary classifiers and subsequent variable importance analysis to generate ranked gene importance.

Am I missing an obvious method? Do my approaches so far make sense?

rna-seq R • 271 views
ADD COMMENTlink written 8 months ago by bimlay230

You describe that you are interested in finding genes most associated with the binary classification (versus building a predictor of your binary class?). If this is a gene selection question, I would think one alternative would be a differential expression approach: i.e. limma or equivalent with your binary classes as contrast, and rank the genes with largest and/or most significant differences between the two classes.

ADD REPLYlink written 8 months ago by Ahill1.6k

Thanks for your comment. I actually used DESeq2 to generate DE results. The mean-dispersion trend looked weird, and I got super, super low p-values. I wasn't sure if any DE method was suited for 35 cell lines lumped into two groups.

ADD REPLYlink modified 8 months ago • written 8 months ago by bimlay230

Ah, OK. If 'biological.phenomenom' is a binary label on each cell line (not an experimental factor that you modulated) then I suppose very confounded with cell.line effects. If cell.line effects are large (probably) but there are still 'biological.phenomenom' main effects that are large enough to observe in that background, then perhaps a rank-based approach like a per-gene univariate Mann-Whitney test comparing the two levels of 'biological.phenomenom' would be worth a try.

ADD REPLYlink modified 8 months ago • written 8 months ago by Ahill1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 871 users visited in the last hour