Extreme values for first PC - genomic data
0
0
Entering edit mode
8.9 years ago

Dear all:

I have done a PCA using R but I am getting extreme values for the first principal component.

The steps that I performed were:

I had a genotype file coded as: 0, 1, 2 or NA.

I replaced the NAs by 1 (heterozygous) as I transformed the matrix to -1, 0 and 1. So NAs would become zero.

I created the G matrix (VanRaden method) and applied the following command on G matrix:

mypca = prcomp(G, center=TRUE)

When I plot the first and second principal component I notticed huge values for PC1. When I plotted PC2 and PC3 I observed what I was expecting.

Do I need to scale the G matrix? What can be causing those huge values for the PC1? Would the NAs genotypes that I replaced cause this big effect?

Any help would be very much appreciated. Thanks. Paula.

pca SNP R • 1.7k views
ADD COMMENT
0
Entering edit mode

How much missing data do you have?

ADD REPLY
0
Entering edit mode

Hey Sean, I excluded animals with more than 3% of missing genotypes and SNPs with more than 5%. So I don't have that much missing information and I also performed the PCA in the genomic relationship matrix. I don't understand why my first PC has extreme values. When I plot the second and third PCs I get exactly what I was expecting for the first and second. It looks like the first PC is capturing some error or something that I didn't get yet. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6