Question: Tools for Biomarker Signature Generation

0

JJ •

**470**wrote:Hi all,

I have RNA-seq samples from two groups (responders / non-responders). I am interested in generating a predictive gene signature which can separate the two groups.

So now I am looking for R packages that could help me with this task. Could you recommend any?

I've used stepwise regression before but this is not feasible in this case with so many variables.

I found a similar A: Resources for gene signature creation where using the DEGs in lasso-penalized regression or to test them independently with cox proportional hazards regression and then pick the top X genes was suggested.

- Could someone point me to a paper / R package / workflow where lasso-penalized regression for such a scenario is described?
- I like the idea to test the DEGs independently with cox proportional hazards regression and then pick the top X genes - I would then feed them into stepwise regression - does this make sense?
- Do you have an alternative suggestion? Classifers such as SVM are an option but this is not my area of expertise...
- I was wondering about the needed sample size for the different approaches. I'd appreciate input here.

Thank you so much!

The Elastic Net regression is very commonly used for this type of analysis, it combines the strenghs of the Ridge regression and LASSO. Both the CCLE and GDSC papers used the elastic net (though I don't think they shared any code or implementations.) https://www-ncbi-nlm-nih-gov.gate.lib.buffalo.edu/pmc/articles/PMC3320027 http://europepmc.org/articles/PMC3349233

As for implementations: glmnet on CRAN is what I use.

440Thanks for the links!! Do you have any sample size recommendations?

470You are welcome. I do not have a sample size recommendation, but a power analysis should be relatively straight forward since you have all the data....

440Here is a nice workflow for penalized logistic regression

470