Hi everyone, and thank you in advance for any kind of help!
I'm trying to perform a Principal Component Analysis both with PLINK and smartPCA. The aim is to use PCs as covariates: they are necessary for the GWAS I have to run after the Principal Component Analysis step.
About PLINK, I used a pruned binary file with 632 ind and about 36500 SNPs, using the command --pca that returned my file .evec and .eval. Then I plotted my .evec file with R.
About smartPCA, I converted my pruned binary files into PED and MAP format and after, through CONVERTF, in EIGENSTRAT format, then adding my population labels in the last column of .ind file. I run smartPCA setting -k 10 and -m 0, hoping to obtain the same result obtained in PLINK. At the end, I used R to construct my final plot.
My plots are both on 632 ind and 36500 variants, but they don't correspond. .Evec values that in PLINK plot are positive, in smartPCA plot are negative, resulting in a reversed clustering along the y-axis (PC2).
Also, labels (and population groups) for same points don't match.
My question is: Why there is this difference?
There is an error plotting my results? Which it could be? O, maybe, it depends from a different way to calculate (in PLINK and smartPCA plot) eigenvector files ? They perform Principal Component Analysis in different ways?
Really thank you if you try to help me. Fran