Question: principal component plot interpretation
gravatar for ED
4.0 years ago by
ED10 wrote:


I'm reading a paper but I have some difficulties with interpreting a principal component plot. I'm not familiar with principal component analysis so I looked up some information to understand better what it does. In the paper, they say "We performed a WGS-based genome-wide association study (GWAS) using a logistic model with principal component correction to account for any remaining population stratification after restriction to individuals with > 95% European ancestry, though inspection of the principal component plots demonstrates the cohorts are well balanced". So the two colors represent two different cohorts which are compared. I read in another paper that the principal component 1 axis reflects variation between two populations which have a different geographical location. But which variation does the principal component 2 axis reflect? And so because these red dots and blue dots are equally spread, they conclude that the cohorts are balanced? Because if the red dots were on one side of the principal component 1 axis and the blue dots on the other side than the differences in allele frequencies could be due to the difference in geographical location of this two cohorts? Am I interpreting this right or not?

enter image description here

ADD COMMENTlink modified 4.0 years ago by WouterDeCoster45k • written 4.0 years ago by ED10
gravatar for WouterDeCoster
4.0 years ago by
WouterDeCoster45k wrote:

The mathematics behind PCA are quite complex, but I find this an excellent explanation.

My rough interpretation is that the most variability in the dataset is projected on these two dimensions, so for genotypes, these are both mostly explained by geographical/ethnical differences. This can also be present in PC3, PC4,... etc. But just the two first components are visualised. And I think your conclusion is correct: populations are equally spread and mixed so no reason to assume population stratification. If the genotypes of the individuals were very different between blue and red cohorts you would expect that PCA separates the two cohorts you can't claim that the samples are from the same population.

ADD COMMENTlink written 4.0 years ago by WouterDeCoster45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1415 users visited in the last hour