Question: How do you extract data coordinates from PCA in R?
1
2.2 years ago by
connor.j.rogerson10 wrote:

I need to extract the x,y coordinates of a PCA plot (generated in R) to plot into excel (my boss prefers excel)

The code to generate the PCA:

``````pca <- prcomp(data, scale=T, center=T)
autoplot(pca, label=T)
``````

If we take a look at `pca\$x`, the first two PC scores are as follows for an example point is:

``````29. 3.969599e+01 6.311406e+01
``````

So for sample 29, the PC scores are `39.69599` and `63.11406`.

However if you look at the output plot in R, the coordinates are not `39.69599` and `63.11406` but ~0.09 ~0.2. Obviously some simple algebra can estimate how the PC scores are converted into the plotted coordinates but I can't do this for ~80 samples.

Can someone please shed some light on how R gets these coordinates and maybe a location to a mystery coordinate file or a simple command to generate a plotted data matrix?

NOTE: pca\$x does not give me what I want

pca R • 5.3k views
modified 2.2 years ago by Jake Warner800 • written 2.2 years ago by connor.j.rogerson10
2

Is this the actual code you typed into R? If so, what `autoplot` are you using, as the `autoplot` from ggplot2 does not have a method for `prcomp` objects? To be more specific, I suspect that using `plot(pca\$x[,1:2])` the coordinates will match up.

Using autoplot function from ggfortify. Allows autoplot do understand PCAs.

That is the issue, then. ggfortify autoplot.prcomp plots values that have been transformed (see https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L140 and https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L259, for example). You'll need to apply those transformations if you want the same coordinates as `autoplot`. Note that the ggfortify package has been removed from CRAN....

1
2.2 years ago by
Jake Warner800
Jake Warner800 wrote:

From the comments it sounds like your autoplot is scaling the data. Using ggplot plots the proper PC1 and PC2 components should get the plot you want:

``````library(ggplot2)
scores = as.data.frame(pca.data\$x)
p <- ggplot(data = scores, aes(x = PC1, y = PC2)) +
geom_point(size=2) +
scale_fill_hue(l=40) +
coord_fixed(ratio=1, xlim=range(scores\$PC1), ylim=range(scores\$PC2)
p
``````

Then just write out `scores` and pass to your mentor to use in excel.

0
2.2 years ago by
apeltzer140
Tuebingen, Germany
apeltzer140 wrote:

Did you check the scale parameter? According to the manual, the values are scaled when this is set to true and could explain why your values are scaled automatically before plotting. You could also set the scaling factor yourself and see if that resolves your issue between visualization and your matrix.

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

Just an idea/hint, can't check right now.

Tried prcomp without scale and still get discrepancies between visualisation and PC scores