Optimal K for Admixture software with 2 local minima
2.2 years ago

Hello everyone!

I am running Admixture software on 800 samples of domestic animals with about 22000 SNPs to estimate individual ancestries. The context is that the K is supposed to be high(20+). First I've worked on reproducing the results with the dataset released previously, and was able to merge it with the data produced in our lab. The original dataset produced a pretty well shaped graph with optimal K being 25, but, when merging the data with our set that is of a admixed population(non pure bred) it becomes tricky, as I've got 2 local minima with K=24 and K=26. Absolute minima would be K=26. (Cross-validation 20 fold)

We discussed this in a lab and I was given advice, for the sake of simplicity, and given that our added population should be historically an admixture of other samples already present in the dataset, to leave k at for our analysis. But there is also a chance that historically another distant sub-population might have contributed to our data set significantly enough to show up.

I'm just second guessing the decision and would appreciate someone maybe with more experience to look into this situation and give me an advice.

