Question: Eigenstrat: Should We Multiply Eigenvector By Sqrt(Eigenvalue) To Form The Pc Axis?
1
7.5 years ago by
nnlnn60
United States
nnlnn60 wrote:

As implied by the paper behind the software, i.e. Price et al. (2006), one would directly use eigenvectors ("ancestries of individuals") from EIGENSTRAT as covariates in subsequent linear or logistic regression. However, these eigenvectors are orthonormal, meaning that they all have the same variance. In other words, variation along each axis (eigenvector) is the same, which is not the way it should be. The variation along an axis should be proportional to its associated eigenvalue (lambda). So I think the correct thing is multiply `eigenvec_k` by the square root of `lambda_k`, and feed it in a regression model as a covariate. On the other hand, it can be shown that `eigenvec_k * sqrt(lambda_k)` is just the kth score vector for the individuals if one runs PCA on genotype matrix of size nxp, rather than its transpose, pxn, (n = sample size; p = number of SNPs); the latter is what is used in Price paper.

Although the whole point of performing eigenstrat is to adjust for structure when testing SNP's effect and hence the significance of a SNP is independent of multiplication of sqrt(lambda) mentioned above, I think we need to use the right PC axes. I would be very grateful to any corrections and comment on this topic.

pca • 3.3k views
modified 7.4 years ago by Hypotheses70 • written 7.5 years ago by nnlnn60
0
7.5 years ago by
Hypotheses70
Bangkok, Thailand
Hypotheses70 wrote:

Not sure if I am quite understand your question, but to calculate the score for each individual `*i*` you do something along this line `eigenvec_k' × GENOTYPE_i`. And, it is this individual specific score that you would use to adjust for population structure, isn't it? Or, do I mis-understand something?

My understanding of `lambda_k` is that this the variation explain by the k_th principal component, and that's pretty much what the eigenvalues are describing.