7.8 years ago by
As far as I understand you want to find the genes (
p) that are the sources of the the majority of the variance between your samples (
You will have to look at entries in your loadings vectors.
pca.object <- princomp(data.matrix) # data.matrix is a [n p] matrix
pca.object$loadings # Your loadings are here
Then you can look at which genes of the genes that have the most extreme loadings.
Loadings range from -1 to 1, and the higher the numerical value of a gene's loading is, the more that gene means for the variance of the principal component in question.
On the other hand, the scores of e.g. PC1 will tell you how the samples differ according to the genes that have high loadings on PC1.
Just a side-note: Have you considered scaling your data-matrix?
You want to see which genes that mean the most for the differences between the samples, and therefore your samples should be in the rows and your genes should be in the columns. As far as I see, you should not transpose your data matrix.
And by the way, in R, use the
prcomp function instead of
princomp (for numerical stability).
prcomp also has the input option of centering and scaling, which you would like if the magnitude of the numbers in your matrix are not of a comparable size.
pca.object <- prcomp(data.matrix, center=TRUE, scale=TRUE) # PCA with centering and scaling
pca.object$rotation # The loadings are here