Hi everyone, and thank you in advance for any kind of help!

I'm trying to perform a **Principal Component Analysis** both with **PLINK** and **smartPCA**.
The aim is to use PCs as covariates: they are necessary for the GWAS I have to run after the Principal Component Analysis step.

About *PLINK*, I used a pruned binary file with **632 ind and about 36500 SNPs**, using the command **--pca** that returned my file .evec and .eval. Then I plotted my .evec file with R.

About *smartPCA*, I converted my pruned binary files into PED and MAP format and after, through CONVERTF, in EIGENSTRAT format, then adding my population labels in the last column of .ind file.
I run smartPCA setting -k 10 and -m 0, hoping to obtain the same result obtained in PLINK.
At the end, I used R to construct my final plot.

My plots are both on 632 ind and 36500 variants, but they don't correspond. .Evec values that in PLINK plot are positive, in smartPCA plot are negative, resulting in a reversed clustering along the y-axis (PC2).

Also, labels (and population groups) for same points don't match.

**My question is:** *Why* there is this difference?

There is an error plotting my results? Which it could be? O, maybe, it depends from a different way to calculate (in PLINK and smartPCA plot) eigenvector files ? They perform Principal Component Analysis in different ways?

Really thank you if you try to help me. Fran

Principal component sign is meaningless and effectively random; what matters is how the points are oriented relative to each other.

And what about the labels for some individuals/points?

For example, the same point that in PLINK plot is referred to sample TD701, Norvegian population, in smartPCA is referred to sample 14798, Italian population. Thank you for your help.

That is not expected, but I can't tell you what went wrong there unless I have enough information to replicate your run.