How Plink PCA output is used as covariates in PheWAS studies
1
2
Entering edit mode
5.7 years ago
rrbutleriii ▴ 260

I am trying to understand how to utilize principal components as covariates for PheWAS studies. Plink can be used to generate eigenvectors, eigenvalues and additionally variant weights. It seems from what I have read that the eigenvectors used as the covariates, but I am unclear then why the eigenvalues are not included in some form. Are the eigenvalues only needed to reconstruct the genetic relationship matrix? And what are the variant weights? Are they the contribution of each variant to each of the principal components?

SNP • 3.2k views
ADD COMMENT
4
Entering edit mode
5.7 years ago

Eigenvectors, i.e., principal components, are vectors that are un-correlated and that contain loadings - what they summarise in your data is covariance among your variables. So, when you include eigenvectors as covariates in regression models, you are adjusting your regression coefficients based on these. It has been found that doing this can help to control for population structure in genetic studies. You should decide on the number of eigenvectors (PCs) to include as covariates via manual inspection of pairwise bi-plots, as per:

For example, in the following figure, eigenvectors 1, 2, an 3 (PCs 1, 2, and 3) are clearly segregating the different populations; so, we would include these eigenvectors as covariates in modelling if we were studying some disease trait / phenotype that spanned all populations

biplot-new

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6