Extreme values for first PC - genomic data
Entering edit mode
8.5 years ago

Dear all:

I have done a PCA using R but I am getting extreme values for the first principal component.

The steps that I performed were:

I had a genotype file coded as: 0, 1, 2 or NA.

I replaced the NAs by 1 (heterozygous) as I transformed the matrix to -1, 0 and 1. So NAs would become zero.

I created the G matrix (VanRaden method) and applied the following command on G matrix:

mypca = prcomp(G, center=TRUE)

When I plot the first and second principal component I notticed huge values for PC1. When I plotted PC2 and PC3 I observed what I was expecting.

Do I need to scale the G matrix? What can be causing those huge values for the PC1? Would the NAs genotypes that I replaced cause this big effect?

Any help would be very much appreciated. Thanks. Paula.

pca SNP R • 1.6k views
Entering edit mode

How much missing data do you have?

Entering edit mode

Hey Sean, I excluded animals with more than 3% of missing genotypes and SNPs with more than 5%. So I don't have that much missing information and I also performed the PCA in the genomic relationship matrix. I don't understand why my first PC has extreme values. When I plot the second and third PCs I get exactly what I was expecting for the first and second. It looks like the first PC is capturing some error or something that I didn't get yet. Thanks.


Login before adding your answer.

Traffic: 1337 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6