24 days ago

Hello Biostars community,

I'm looking into population structure of honey bees using the programs ANGSD and NGSadmix. However, my samples represent a mix of both males, which are haploid, and females, which are diploid. My first pass at running these samples through the programs revealed strong structuring of males belonging to the same colony which leads me to believe that there is a skew for haploid individuals. This strong structuring is observed even after thinning for linkage disequilibrium. The challenge that I am facing is that the reference individuals, which help me identify known lineages, contains all diploid individuals. Thus, I'm wondering:

1) If someone could explain why haploid individuals may lead to potential biases in ancestral inference when also including diploid individuals in the analysis?

2) Are there flags that I can include in my ANGSD and NGSadmix analysis to handle multi-ploidy? Or, are there are programs which may be better suited to handle my specific data?

23 days ago

1 ) I would not expect the haploid individuals to always cluster together but the model assumptions are violated. First of all the diploid genotype likelihoods from ANGSD will have a non zero likelihood for a heterozygous state. This can be fixed by simply modifying the diploid genotype likelihoods from ANGSD by setting the likelihood of the heterozygoes state to zero. However, when you use haploid genotype likelihood data and as input to NGSadmix then NGSadmix will still model the 3 possible genotypes states assuming HWE proportions (given the individual allele frequency). You are then incorrectly assuming that the probability of the homozygoes state is the squared individual allele frequency however because you data is haploid the probability should be just the individuals allele frequency ( not squared). This can create a bias.

2 ) There is no flag or option in NGSadmix to make it run on data from a mix of diploid and haploid genotype likelihoods. Since you what to use NGSadmix then I assume that you have low depth data and that you want to perform unsupervised clustering and I dont know any software that can do that on mixed ploidy ( although the modelling is straight forward ).


