Question: Adding Percentage in PCA
0
gravatar for krushnach80
16 months ago by
krushnach80480
krushnach80480 wrote:

This is my code for PCA using SVD , i get a neat plot , I want to add percatge to the axis I used to get in deseq2 plot im not sure how it adds to it

library(ISLR)
ncidat = (NON_CODING[,-1])
dim(ncidat)
ncidat[1:5,1:16]
X = t(scale(t(ncidat),center=TRUE,scale=FALSE))
View(X)
################
sv = svd(t(X))
U = sv$u
V = sv$v
D = sv$d
###############

## in R calculate the rank of a matrix is by
qr(t(X))$rank
cols = as.numeric(as.factor(colnames(ncidat)))
plot(U[,1],U[,2],type="n",xlab="PC1",ylab="PC2")
text(U[,1],U[,2],colnames(X),col=cols)

par(mfrow=c(1,2))
Z = t(X)%*%V

# plot PC1 vs PC2
plot(Z[,1], Z[,2], type ="n", xlab="PC1", ylab="PC2")
text(Z[,1], Z[,2], colnames(X), col=cols)

pc_dat<- data.frame(type = rownames(Z), PC1 = Z[,1], PC2= Z[,2])
library(ggplot2)
p<-ggplot(pc_dat,aes(x=PC1, y=PC2, col=type)) + geom_point() + 
  geom_text(aes(label = type), hjust=0, vjust=0)

p<-p + theme(text = element_text(size = 25))
p

Any suggestion or help would be highly appreciated how to add percentage on the PC1 and PC2 axis ....

R • 1.2k views
ADD COMMENTlink written 16 months ago by krushnach80480

Did you try 3.4 The percentage code from http://huboqiang.cn/2016/03/03/RscatterPlotPCA

ADD REPLYlink written 16 months ago by Sej Modha4.1k

i seen that code but i'm not sure how to pass my Principal component I mean which object is storing the principal component .

Can you have a look at my code and let me know ?

ADD REPLYlink written 16 months ago by krushnach80480
1

sv$d (D above) values are related to pc standard.deviation. Use formula to convert sv$d values to pc standard deviations. If you are interested in displaying pc percentage, it is better to run PCA instead of SVD.

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011211k

okay...so i will give it a try

ADD REPLYlink written 16 months ago by krushnach80480
3
gravatar for Kevin Blighe
16 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Take a leaf out of my own PCA code: A: PCA plot from read count matrix from RNA-Seq

The formula for converting standard deviations to percent explained variation is:

((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100

i.e., for each PC's standard deviation, square it, and then divide by the sum of all squared PC standard deviations

ADD COMMENTlink written 16 months ago by Kevin Blighe41k

I m going to use your code would you suggest SVD or PCA ?

ADD REPLYlink written 16 months ago by krushnach80480
2

A little known tip: PCA and SVD are more or less the 'same' thing. There are various 'quirks' like this in statistics where different statistical methods end up producing the same results. If you even look at the code for the Base R function prcomp, which performs PCA, it's in fact using the base code for the SVD function.

It is possible to perform PCA using non-SVD methods though.

ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe41k

I was using this reference since i was looking for ways to find out gene which gives PC1 and PC2 lets say the most variable genes across various sample ,I would like to extract the genes from the PCA and use them for downstream analysis ,is there a simpler way to do this ?

ADD REPLYlink written 16 months ago by krushnach80480

If you used my code take a look at:

project.pca$rotation

This should contain the rotated component loadings for each gene to each PC, ordered by strength of their association to the PC based on variation.

ADD REPLYlink modified 6 weeks ago • written 16 months ago by Kevin Blighe41k
1

great help i will try and let know...

ADD REPLYlink written 16 months ago by krushnach80480

@Kevin I tried your code and checked project.pca$rotation it contains all the gene i use as i see the dimension of my input gene and project.pca$rotation are same except the non-numeric column , but is that in order?I mean the same as my input list ?

ADD REPLYlink written 16 months ago by krushnach80480

Each gene should have a value, which is unitless but is a measure of the strength of the gene's relationship with the eigenvector (principal component). Large absolute values should indicate greater covariation between the samples being segregated by the eigenvector in question.

ADD REPLYlink modified 5 months ago • written 16 months ago by Kevin Blighe41k

and how do i extract that? or get the gene from those principal component? is it just manual or method or can i do it a R

ADD REPLYlink written 16 months ago by krushnach80480
1

The genes should be the rownames of the data-frame project.pca$rotation ?

ADD REPLYlink written 16 months ago by Kevin Blighe41k

yes it is ....as such

pca$rotation
                                       PC1           PC2           PC3           PC4
5S_rRNA                      -1.090574e-02 -2.412665e-03  1.637689e-02 -3.603865e-02
AB019441.29                  -1.928250e-02  1.821083e-03 -9.713724e-03 -1.978737e-03

but my question is how am i going to find which are the genes that came with PC1 and PC2 or contributing to to the first two Principal component

I have created a new question and the paper link also cited in the question

Extracting features or gene from PCA after calculating PCA for downstream analysis

You can see and let me know .

ADD REPLYlink modified 16 months ago • written 16 months ago by krushnach80480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 889 users visited in the last hour