Question: Theoretical problems using PCA to control for cases and controls with different ethnicity
Suppose one wished to control for population substructure using PCA. No problem, lots of programs do that, provided cases and controls are matched for ethnicity.

However, now suppose that one had the following:

200 Genomes of healthy controls from CEU population

800 Genomes of healthy controls from YRI population

1000 Genomes of African American patients with a complex disease state, who are on average 20% CEU.

The average ethnicity is the same between the two groups.

In this case, why would using a PCA be insufficient to control for population substructure (or would it be sufficient?) Would welcome answers based on either DNA microarray chip OR whole genome sequencing.







pca gwas • 815 views
