Question: Variable selection for multiple regression from large number of predictors
3.3 years ago by
United States
These are micro array datasets. I have 20 response variables Y=(Y1,…,Y20), and 1600 predictor variables X=(X1,…,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y. 

So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y. 

I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity? 

My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid ?


ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 3.3 years ago by cjgunase30

I think it is a statistics question, not bioinformatics one. You should try asking here:

ADD REPLYlink written 3.3 years ago by mkulecka300
