I am quite fascinated by the field of data mining and it's application in solving biological problems . I have read lot of papers in past few months where people have tried to answer range of problems like predicting protein-protein interaction sites,post translational modification sites, disordered regions in a protein,etc.The more I read about these algorithm , the more doubts it created about reliability of these predictions.
Most of these papers gave me impression as if people are using algorithms like SVM,Random forest, ANN and many more as black box , where u feed in some discriminatory features as input and use analysis measure like ROC curve,MCC measures,etc to prove that you algorithm works better than others.I have also read about some papers where they describe something called "meta predictor" , in which they combine various other predictor's result to arrive at their own prediction values.
I was interested to know ,when you design a data mining based algorithm , how much importance you give to features and how much to the algorithm? Moreover how do you decide , which discriminative feature will give you the best predictive result? will "meta predictor" always give you a better result?