Ranking Combination Of Snp By Mean Of Svm
1
0
Entering edit mode
11.3 years ago
guido.leoni ▴ 10

I have a reasonable case control population genotyped for a panel of 20 snps related to a pathology (according to OMIM). I would like to identify the best panel of snps that allow to discriminate cases from controls. What do you think about the possibility to achieve this goal by mean of Support Vector Machines models elaborated considering all the different combinations of SNP? In other words i'm thinking to: elaborate SVM models with all the combinations of subset of SNPs estimate the accuracy of each model choose the combination of snp that provide the model(s) with the best accuracy

snp classification • 2.0k views
ADD COMMENT
1
Entering edit mode
11.3 years ago
Michael 54k

That is most likely a bad idea. There are 2^20 possible subsets of 20 elements (1 048 576). To compare them, you would have to run cross validation on all of them, and then do another cross validation of the optimal feature classifier on a third independent validation set, not used to find the optimal parameters. My concern is that this feature selection process will be prone to over fitting. Also, svm classifiers weight the features internally, such that adding features to an otherwise optimal subset would not diminish accuracy, or maybe even slightly increase it due to over fitting.

I would try simpler approaches first, e.g. linear models, or linear or quadratic discriminant analysis. These can also give estimates for the relevance of features.

ADD COMMENT

Login before adding your answer.

Traffic: 1510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6