Standard error of cross-validation estimates in software ADMIXTURE
0
0
Entering edit mode
8.6 years ago

Hello,

I'm currently using the software "ADMIXTURE" to calculate the most probable number of genetic groups (K) in a large panel of 38 landraces (old local populations) using high density, genome-wide distributed SNP markers.

For the estimation of K, I ran the calculations for K = 1 to 65 and plotted the cross-validation errors for each model. Unfortunately the CV error plot has ambiguous results, with a rather flat curve since K = 40. Therefore, there is no clear minimum! As I have included 38 landraces, my expectation would have been K = 38. But it seems that the CV error gets slightly (3rd or 4th decimal place) lower with each increase in K. Maybe it is impossible to define a best K for such a diverse panel with a too high number of different subpopulations.

I would be interested in the standard errors of the cross-validation error estimates. One should be able to calculate the variance and std. error, as it is a 5-fold cross validation. Is there any way to get an output for this CV std. error in ADMIXTURE?

I would then simply choose the most parsimonious model (lowest K) which is within the standard error of the best model (lowest CV error).

I appreciate any advice,

Manfred

ADMIXTURE • 4.3k views
ADD COMMENT
1
Entering edit mode

Eventually , you can try penalized estimation (manual, page 11) recommended for large K values.

ADD REPLY
0
Entering edit mode

Thank you for this hint, Galina. It's definitely worth a try. Do you have any experience about that? I'm not sure what values I should take for lambda and epsilon. Maybe I have to try different values and take the one with the lowest CV error.

ADD REPLY

Login before adding your answer.

Traffic: 2609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6