Question: ADMIXTURE optimal k depends on filters?
gravatar for RvV
13 months ago by
RvV0 wrote:

I have a variant dataset for 95 plant genomes based on 30x coverage whole-genome resequencing. I use the --indep-pairwise command in PLINK 1.9 to filter variants based on LD within a window of 20,000 variants and run PCA's. My PCA plots show clear differentiation between groups of individuals so I want to assess population structure and admixture in ADMIXTURE.

Running ADMIXTURE 1.3 for various numbers of populations (k = 1:9) the populations recovered match those found in the PCA. However, which k has the lowest cross-validation error is very much dependent on the LD threshold used for filtering. Below is a plot showing cross-validation errors for k=1:9 based on three differently filtered datasets. ADMIXTURE cross-validation errors As you can see, with the largest dataset based on an LD threshold of R^2 = 0.5 (n=114,953 variants), the optimal k = 1. The smaller dataset based on r^2 = 0.3 (n=14,574) has optimal k = 2. The smallest dataset based on R^2 < 0.1 (n=2,034) has optimal k = 4. Also, overall cross-validation errors go up with reducing LD thresholds.

I am not a population geneticist but I find this highly surprising. Why would including more variants in LD reduce both the cross-validation errors and the population structure? I would expect that linked variants would boost any population-level signal rather than reduce it. If anyone can explain these patterns to me I would be much obliged.

More practically, how can I determine the optimal filtering and k for my dataset?

Many thanks in advance.

ADD COMMENTlink modified 13 months ago • written 13 months ago by RvV0

Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here)

ADD REPLYlink written 13 months ago by Ram32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2007 users visited in the last hour