Question: Statistical Genetics - What information is required for haplotype phasing in a novel, admixed population?
Let's say there is a population that has never been sequenced before and has experienced varying levels of admixture from multiple sources at different times. There is ancient admixture and recent admixture. Also, the population is highly consanguineous, but also very genetically diverse and did not originate from a small founder population (like Ashkenazi Jews for example), meaning that there is far less identity by state between any two unrelated individuals than one would expect in a consanguineous population.

PCA indicates that there is a high degree of stratification in terms of ancestry and non-random mating, but also that there is no directly applicable reference population panel that can be used for haplotype frequency estimation and phasing. Let's just say that I have multiple reasons - theoretical and results wise - to believe that my current efforts have not worked correctly due to these reasons. If you doubt this, please let me know why because I have not been formally trained in this area.

This population now needs to be sequenced. Let's assume that whole exome sequencing will be performed on ~ 300 unrelated samples. Would this be enough? Or would whole-genome genotyping on SNP arrays need to be performed on the samples, as well as on related individuals?

Any insight would be helpful. Thank you.

