Question: How do you extract data coordinates from PCA in R?
1
gravatar for connor.j.rogerson
2.2 years ago by
connor.j.rogerson10 wrote:

I need to extract the x,y coordinates of a PCA plot (generated in R) to plot into excel (my boss prefers excel)

The code to generate the PCA:

pca <- prcomp(data, scale=T, center=T)
autoplot(pca, label=T)

If we take a look at pca$x, the first two PC scores are as follows for an example point is:

29. 3.969599e+01 6.311406e+01

So for sample 29, the PC scores are 39.69599 and 63.11406.

However if you look at the output plot in R, the coordinates are not 39.69599 and 63.11406 but ~0.09 ~0.2. Obviously some simple algebra can estimate how the PC scores are converted into the plotted coordinates but I can't do this for ~80 samples.

Can someone please shed some light on how R gets these coordinates and maybe a location to a mystery coordinate file or a simple command to generate a plotted data matrix?

NOTE: pca$x does not give me what I want

pca R • 5.3k views
ADD COMMENTlink modified 2.2 years ago by Jake Warner800 • written 2.2 years ago by connor.j.rogerson10
2

Is this the actual code you typed into R? If so, what autoplot are you using, as the autoplot from ggplot2 does not have a method for prcomp objects? To be more specific, I suspect that using plot(pca$x[,1:2]) the coordinates will match up.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Sean Davis26k

Using autoplot function from ggfortify. Allows autoplot do understand PCAs.

ADD REPLYlink written 2.2 years ago by connor.j.rogerson10

That is the issue, then. ggfortify autoplot.prcomp plots values that have been transformed (see https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L140 and https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L259, for example). You'll need to apply those transformations if you want the same coordinates as autoplot. Note that the ggfortify package has been removed from CRAN....

ADD REPLYlink written 2.2 years ago by Sean Davis26k
1
gravatar for Jake Warner
2.2 years ago by
Jake Warner800
Jake Warner800 wrote:

From the comments it sounds like your autoplot is scaling the data. Using ggplot plots the proper PC1 and PC2 components should get the plot you want:

library(ggplot2)
scores = as.data.frame(pca.data$x) 
p <- ggplot(data = scores, aes(x = PC1, y = PC2)) + 
    geom_point(size=2) + 
    scale_fill_hue(l=40) + 
    coord_fixed(ratio=1, xlim=range(scores$PC1), ylim=range(scores$PC2) 
p

Then just write out scores and pass to your mentor to use in excel.

ADD COMMENTlink written 2.2 years ago by Jake Warner800
0
gravatar for apeltzer
2.2 years ago by
apeltzer140
Tuebingen, Germany
apeltzer140 wrote:

Did you check the scale parameter? According to the manual, the values are scaled when this is set to true and could explain why your values are scaled automatically before plotting. You could also set the scaling factor yourself and see if that resolves your issue between visualization and your matrix.

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

Just an idea/hint, can't check right now.

ADD COMMENTlink written 2.2 years ago by apeltzer140

Tried prcomp without scale and still get discrepancies between visualisation and PC scores

ADD REPLYlink written 2.2 years ago by connor.j.rogerson10

"scale" in prcomp changes all the input values to give you an SD of 1 for each sample. That affects how the pca$x values come out, but not how they are plotted.

ADD REPLYlink written 2.2 years ago by swbarnes27.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour