What is a bad lowest-CV value for Admixture?
Entering edit mode
6.3 years ago
devenvyas ▴ 680

I am running Admixture analyses. In order to recapitulate some IACs from the literature, I took the intersection of the larger dataset I am working with, which is based off the Affymetrix Human Origins array, with some data from populations-of-interest, which were based off the Illumina Omni 1M chip. I had about 140,000 SNPs after merging and then about 111,000 or 120,000 after pruning (--indep-pairwise 200 25 0.4 or --indep-pairwise 50 5 0.5)

When I am just using Affymetrix Human Origins data (I have around ~280,000 SNPs after pruning), and I get CV errors minimuming or plateauing  around of 0.33 or 0.35-6.

With the overlap dataset, my CV errors are much larger. For example, with the 50 5 0.5 pruning method, here are my CVs

CV error (K=1): 0.58226
CV error (K=2): 0.54319
CV error (K=3): 0.53868
CV error (K=4): 0.53628
CV error (K=5): 0.53454
CV error (K=6): 0.53349
CV error (K=7): 0.53230
CV error (K=8): 0.53179
CV error (K=9): 0.53115
CV error (K=10): 0.53091
CV error (K=11): 0.53074
CV error (K=12): 0.53059
CV error (K=13): 0.53086
CV error (K=14): 0.53057
CV error (K=15): 0.53094
CV error (K=16): 0.53102
CV error (K=17): 0.53136
CV error (K=18): 0.53161
CV error (K=19): 0.53186
CV error (K=20): 0.53243

I am wondering why are these so high compared to the original dataset. Are these too high, or is this reasonable?





admixture CV SNP • 3.0k views
Entering edit mode
6.3 years ago
Vincent Laufer ★ 1.4k

It is very likely that the CV values are increased due to the additional variability introduced by merging two different datasets, which could have various differences, such as being generated at different times, on two different companies arrays, by different people, on different individuals coming from potentially different populations, etc. all of those things can impact the standard error.

The only one of those that I can say for sure is the arrays, because you mention that, but from your description it seems that is true.

The values do seem ok to me. 

One approach to see if you could lower them is to generate the admixture estimates, then looking at SNPs that have very different AF between the two groups / chips despite similar ancestry estimation, then removing those SNPs, then re-generate the estimates.


Login before adding your answer.

Traffic: 3316 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6