Regression analysis with high dimensional predictors
1
1
Entering edit mode
5.2 years ago
M K ▴ 530

Dear All,

I am conducting a research to study the effects of DNA repeats on gene expression. I have 20,000 observation for the gene expression and about 1200 DNA repeats (predictor variables) that effect gene expression. I need to build a multiple regression model for this study. I found some techniques for variable selection, for example LASSO regression. My question is there any other technique to do that or which is the best method for doing that. BTW, in my case the predictors variables P are less than the number of observation n. 

next-gen R • 1.3k views
ADD COMMENT
0
Entering edit mode
5.2 years ago

ElasticNet and LASSO are common approaches for penalized regression. You could also consider machine-learning approaches such as random forest and GBM. Unfortunately, I do not think there is a known "best" way to do what you want to do, so you'll probably need to experiment a bit. You'll also want to keep in mind that some of these methods make assumptions about the nature of your data (continuous, discrete, bell-shaped, missing values, etc.).

ADD COMMENT
0
Entering edit mode

Thanks, Sean. The dependent variable in my study which is gene expression is continuous and it is approximately normally distributed, but most of predictors are taking count as o or 1. I don't know if there any other assumption for the ElasticNet and LASSO that should be satisfied to run them for variable selection.

ADD REPLY

Login before adding your answer.

Traffic: 1689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6