Question: Variable selection for multiple regression from large number of predictors
0
gravatar for cjgunase
3.3 years ago by
cjgunase30
United States
cjgunase30 wrote:

These are micro array datasets. I have 20 response variables Y=(Y1,…,Y20), and 1600 predictor variables X=(X1,…,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y. 

So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y. 

I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity? 

My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid ?

 

chip-seq gene • 1.0k views
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 3.3 years ago by cjgunase30

I think it is a statistics question, not bioinformatics one. You should try asking here: http://stats.stackexchange.com/

ADD REPLYlink written 3.3 years ago by mkulecka300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1037 users visited in the last hour