Question

PCA with PLINK and SmartPCA: same imput file but different results in my plot. Why?

0

Entering edit mode

3.2 years ago

francesca94.minnai • 0

Hi everyone, and thank you in advance for any kind of help!

I'm trying to perform a Principal Component Analysis both with PLINK and smartPCA. The aim is to use PCs as covariates: they are necessary for the GWAS I have to run after the Principal Component Analysis step.

About PLINK, I used a pruned binary file with 632 ind and about 36500 SNPs, using the command --pca that returned my file .evec and .eval. Then I plotted my .evec file with R.

About smartPCA, I converted my pruned binary files into PED and MAP format and after, through CONVERTF, in EIGENSTRAT format, then adding my population labels in the last column of .ind file. I run smartPCA setting -k 10 and -m 0, hoping to obtain the same result obtained in PLINK. At the end, I used R to construct my final plot.

My plots are both on 632 ind and 36500 variants, but they don't correspond. .Evec values that in PLINK plot are positive, in smartPCA plot are negative, resulting in a reversed clustering along the y-axis (PC2).

Also, labels (and population groups) for same points don't match.

My question is: Why there is this difference?

There is an error plotting my results? Which it could be? O, maybe, it depends from a different way to calculate (in PLINK and smartPCA plot) eigenvector files ? They perform Principal Component Analysis in different ways?

Really thank you if you try to help me. Fran

PCA PLINK smartPCA EIGENSTRAT plot • 1.8k views

ADD COMMENT • link 3.2 years ago by francesca94.minnai • 0

1

Entering edit mode

Principal component sign is meaningless and effectively random; what matters is how the points are oriented relative to each other.

ADD REPLY • link 3.2 years ago by chrchang523 10k

0

Entering edit mode

And what about the labels for some individuals/points?

For example, the same point that in PLINK plot is referred to sample TD701, Norvegian population, in smartPCA is referred to sample 14798, Italian population. Thank you for your help.

ADD REPLY • link 3.2 years ago by francesca94.minnai • 0

0

Entering edit mode

That is not expected, but I can't tell you what went wrong there unless I have enough information to replicate your run.

ADD REPLY • link 3.2 years ago by chrchang523 10k