Dear all:

I have done a PCA using R but I am getting extreme values for the first principal component.

The steps that I performed were:

I had a genotype file coded as: 0, 1, 2 or NA.

I replaced the NAs by 1 (heterozygous) as I transformed the matrix to -1, 0 and 1. So NAs would become zero.

I created the G matrix (VanRaden method) and applied the following command on G matrix:

```
mypca = prcomp(G, center=TRUE)
```

When I plot the first and second principal component I notticed huge values for PC1. When I plotted PC2 and PC3 I observed what I was expecting.

Do I need to scale the G matrix? What can be causing those huge values for the PC1? Would the NAs genotypes that I replaced cause this big effect?

Any help would be very much appreciated. Thanks. Paula.

How much missing data do you have?

Hey Sean, I excluded animals with more than 3% of missing genotypes and SNPs with more than 5%. So I don't have that much missing information and I also performed the PCA in the genomic relationship matrix. I don't understand why my first PC has extreme values. When I plot the second and third PCs I get exactly what I was expecting for the first and second. It looks like the first PC is capturing some error or something that I didn't get yet. Thanks.