Question

How can I get the proportion of variantion for PC1 and PC2 for PCA in GCTA

1

Entering edit mode

7.0 years ago

zhangdezhi008 ▴ 10

Hi, all I used GCTA to do principle conponent analysis of my vcf file. And I got two files "pca.eigenval" and "pca.eigenvec", how can I get the proportion of variantion for PC1 and PC2?

Any suggestions will help alot!

Thanks for your attention very much!

Dez

SNP next-gen sequence • 7.1k views

ADD COMMENT • link updated 7.0 years ago by Santosh Anand 5.7k • written 7.0 years ago by zhangdezhi008 ▴ 10

score 1 · Answer 1 · 2017-05-09

1

Entering edit mode

7.0 years ago

Santosh Anand 5.7k

The eigenvalue is the variance explained by various Principal Components. Choose the top two for PC1 and PC2

ADD COMMENT • link 7.0 years ago by Santosh Anand 5.7k

0

Entering edit mode

Thanks for your reply, Santosh. But how do I know the proportion of the variation for PC1 and PC2? I have got a pca result using the parameters gcta --grm plink_grm --pca 3 --out plink_pca. The following is the eigenvalues: 2.66081 1.9079 1.62115 1.4276 1.32104 1.30427 1.17457 1.114 1.11038 1.09591 1.08547 1.07133 1.06702 1.0525 1.0433 0.971419 0.964229 0.933844 0.900301 0.897893 0.841508 0.815369 0.806853 0.800011 0.776619 0.75833 0.713646 0.653838 0.617058 0.58745 0.578381 0.569602 0.569006 0.521133 0.503277 -0.100047

ADD REPLY • link 7.0 years ago by zhangdezhi008 ▴ 10

0

Entering edit mode

Each eigenvalue describes the (absolute) variance explained by corresponding eigenvector. Their sum (SUM_TOTAL_EIGENVECS) = total variance. You see that they are arranged in descending order. So to get the % of variance explained by first two PCs: (2.66081/SUM_TOTAL_EIGENVECS)*100 and (1.9079/SUM_TOTAL_EIGENVECS)*100.

PS: Usually you should not get negative eigenvalues (last one). But since it is very small, you may keep the last negative value or just ignore it, it not going to change the % much.

ADD REPLY • link 7.0 years ago by Santosh Anand 5.7k

0

Entering edit mode

Does "SUM_TOTAL_EIGENVECS" mean to sum all eigenvalues (sum=36 in my data)? Then the first two PCs only represented 7% and 5% variation, is it too low to use?

ADD REPLY • link 7.0 years ago by zhangdezhi008 ▴ 10

0

Entering edit mode

Yes, the calculation is right. Your principal components are not able to give the directions with large variations because your data is essentially spherical (in a multidimensional space). And that's what I was worried after seeing the negative eigenvalue. If you are wondering about the interpretation and how PCA is computed, you may check this excellent article on SE https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

ADD REPLY • link 6.9 years ago by Santosh Anand 5.7k

0

Entering edit mode

Santosh, Thank you very much! Are there any methods to remedy? Actually, I have also used multidimentional scaling (MDS) for the same dataset, MDS is similar to PCA. I used the following parameters for MDS: plink --bfile plink --read-genome ibs1.genome --out mds1 --cluster --mds-plot 4. Then I got 4 columns (MDS1,2,3,4), do I also need to know each coloumn's variation proportions like PCA?

A lot layman questions, very thanks for your patient and detailed reply, it helps a lot!

ADD REPLY • link 6.9 years ago by zhangdezhi008 ▴ 10