These are micro array datasets. I have 20 response variables Y=(Y1,…,Y20), and 1600 predictor variables X=(X1,…,Y1600). There are 128 observations. I wanted to know which pairs of X can best predict each of Y.
So I generated all the combinations of (Yi,Xj,Xk) and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of X are the best predictors for Y.
I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity?
My goal is to find the best pairs of Xj, Xk that can predict a Yk. Can you give some suggestions to further improve this procedure to make it statistically valid ?