Suppose one wished to control for population substructure using PCA. No problem, lots of programs do that, provided cases and controls are matched for ethnicity.
However, now suppose that one had the following:
200 Genomes of healthy controls from CEU population
800 Genomes of healthy controls from YRI population
1000 Genomes of African American patients with a complex disease state, who are on average 20% CEU.
The average ethnicity is the same between the two groups.
In this case, why would using a PCA be insufficient to control for population substructure (or would it be sufficient?) Would welcome answers based on either DNA microarray chip OR whole genome sequencing.