I am running Admixture analyses. In order to recapitulate some IACs from the literature, I took the intersection of the larger dataset I am working with, which is based off the Affymetrix Human Origins array, with some data from populations-of-interest, which were based off the Illumina Omni 1M chip. I had about 140,000 SNPs after merging and then about 111,000 or 120,000 after pruning (--indep-pairwise 200 25 0.4 or --indep-pairwise 50 5 0.5)
When I am just using Affymetrix Human Origins data (I have around ~280,000 SNPs after pruning), and I get CV errors minimuming or plateauing around of 0.33 or 0.35-6.
With the overlap dataset, my CV errors are much larger. For example, with the 50 5 0.5 pruning method, here are my CVs
CV error (K=1): 0.58226 CV error (K=2): 0.54319 CV error (K=3): 0.53868 CV error (K=4): 0.53628 CV error (K=5): 0.53454 CV error (K=6): 0.53349 CV error (K=7): 0.53230 CV error (K=8): 0.53179 CV error (K=9): 0.53115 CV error (K=10): 0.53091 CV error (K=11): 0.53074 CV error (K=12): 0.53059 CV error (K=13): 0.53086 CV error (K=14): 0.53057 CV error (K=15): 0.53094 CV error (K=16): 0.53102 CV error (K=17): 0.53136 CV error (K=18): 0.53161 CV error (K=19): 0.53186 CV error (K=20): 0.53243
I am wondering why are these so high compared to the original dataset. Are these too high, or is this reasonable?