Question: what does the .eigenval file stand for in plink 1.9
0
summeryhx0 wrote:

Hi,all

I got the .eigenval and .eigenvec file with the --pca code in plink1.9

Is there any one can tell me what the data of the .eigenval stand for ,is it the variance？or standard deviation?

 Also, is there an option to output the variance explained by each PC? Thank you.

and is there any rule on how to choose the eigenvecs as covariates according to the eigenval file?

modified 5.7 years ago by Jean-Karim Heriche23k • written 5.7 years ago by summeryhx0
1
Jean-Karim Heriche23k wrote:

I am not a plink user but a quick look at the documentation told me that the .eigenvec file contains the requested number of principal components (PCs) and the .eigenval file contains the corresponding eigenvalues, one per line. The eigenvalues tell you how much variation is explained by the associated PC. The total variance of the data is the sum of the variances of the individual PCs i.e. the sum of the elements on the diagonal of the covariance matrix which is also the sum of its eigenvalues. Therefore the fraction of variance explained by a PC is the ratio of the sum of the eigenvalue associated with this PC to the sum of all eigenvalues. To select how many PCs to use, you can plot the variance explained by each PC in decreasing order (scree plot). There's often an elbow separating the most important PCs from the less important ones. A widely used rule in PCA is therefore to use the PCs to the left of the elbow.

thank you for your explanation, it is very helpful. but still one thing confusing me. when I did the PCA by R , the  total variances usually equal to the number of PCs,such as the example as the following:

Comp.1    Comp.2    Comp.3     Comp.4
Standard deviation     1.5748783 0.9948694 0.5971291 0.41644938
Proportion of Variance 0.6200604 0.2474413 0.0891408 0.04335752
Cumulative Proportion  0.6200604 0.8675017 0.9566425 1.00000000

It gives Standard deviation (sd) instead of variance, so you have to square sd to get the variance.
total variance=(1.5748783^2+ 0.9948694^2+ 0.5971291^2+ 0.41644938^2)=4

but my .eigenval file of the plink is as following ,which not as the rule above（the total variance not equal to the number of PCs，20），would you help me explain that? thank you

20.0134
2.98845
2.32333
1.94295
1.93421
1.91117
1.88628
1.86544
1.85781
1.84763
1.76204
1.5532
1.3277
1.1808
1.14857
1.13482
1.13316
1.12439
1.1194
1.11312

1

How many PCs did you request and what's the size of your covariance matrix ? My guess is that you have n>20 and you only got the first 20 eigenvalues corresponding to the first 20 PCs. In PCA, the data is often standardized first. In that case, the sum of the eigenvalues equals the number of variables since all variables have a variance of 1.