0
2.7 years ago by
krushnach80810
krushnach80810 wrote:

This is my code for PCA using SVD , i get a neat plot , I want to add percatge to the axis I used to get in deseq2 plot im not sure how it adds to it

library(ISLR)
ncidat = (NON_CODING[,-1])
dim(ncidat)
ncidat[1:5,1:16]
X = t(scale(t(ncidat),center=TRUE,scale=FALSE))
View(X)
################
sv = svd(t(X))
U = sv\$u
V = sv\$v
D = sv\$d
###############

## in R calculate the rank of a matrix is by
qr(t(X))\$rank
cols = as.numeric(as.factor(colnames(ncidat)))
plot(U[,1],U[,2],type="n",xlab="PC1",ylab="PC2")
text(U[,1],U[,2],colnames(X),col=cols)

par(mfrow=c(1,2))
Z = t(X)%*%V

# plot PC1 vs PC2
plot(Z[,1], Z[,2], type ="n", xlab="PC1", ylab="PC2")
text(Z[,1], Z[,2], colnames(X), col=cols)

pc_dat<- data.frame(type = rownames(Z), PC1 = Z[,1], PC2= Z[,2])
library(ggplot2)
p<-ggplot(pc_dat,aes(x=PC1, y=PC2, col=type)) + geom_point() +
geom_text(aes(label = type), hjust=0, vjust=0)

p<-p + theme(text = element_text(size = 25))
p

Any suggestion or help would be highly appreciated how to add percentage on the PC1 and PC2 axis ....

R • 2.5k views
written 2.7 years ago by krushnach80810

Did you try 3.4 The percentage code from http://huboqiang.cn/2016/03/03/RscatterPlotPCA

i seen that code but i'm not sure how to pass my Principal component I mean which object is storing the principal component .

Can you have a look at my code and let me know ?

1

sv\$d (D above) values are related to pc standard.deviation. Use formula to convert sv\$d values to pc standard deviations. If you are interested in displaying pc percentage, it is better to run PCA instead of SVD.

okay...so i will give it a try

3
2.7 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Take a leaf out of my own PCA code: A: PCA plot from read count matrix from RNA-Seq

The formula for converting standard deviations to percent explained variation is:

((project.pca\$sdev^2) / (sum(project.pca\$sdev^2)))*100

i.e., for each PC's standard deviation, square it, and then divide by the sum of all squared PC standard deviations

I m going to use your code would you suggest SVD or PCA ?

2

A little known tip: PCA and SVD are more or less the 'same' thing. There are various 'quirks' like this in statistics where different statistical methods end up producing the same results. If you even look at the code for the Base R function prcomp, which performs PCA, it's in fact using the base code for the SVD function.

It is possible to perform PCA using non-SVD methods though.

I was using this reference since i was looking for ways to find out gene which gives PC1 and PC2 lets say the most variable genes across various sample ,I would like to extract the genes from the PCA and use them for downstream analysis ,is there a simpler way to do this ?

If you used my code take a look at:

project.pca\$rotation

This should contain the rotated component loadings for each gene to each PC, ordered by strength of their association to the PC based on variation.

1

great help i will try and let know...

@Kevin I tried your code and checked project.pca\$rotation it contains all the gene i use as i see the dimension of my input gene and project.pca\$rotation are same except the non-numeric column , but is that in order?I mean the same as my input list ?

Each gene should have a value, which is unitless but is a measure of the strength of the gene's relationship with the eigenvector (principal component). Large absolute values should indicate greater covariation between the samples being segregated by the eigenvector in question.

and how do i extract that? or get the gene from those principal component? is it just manual or method or can i do it a R

1

The genes should be the rownames of the data-frame project.pca\$rotation ?

yes it is ....as such

pca\$rotation
PC1           PC2           PC3           PC4
5S_rRNA                      -1.090574e-02 -2.412665e-03  1.637689e-02 -3.603865e-02
AB019441.29                  -1.928250e-02  1.821083e-03 -9.713724e-03 -1.978737e-03

but my question is how am i going to find which are the genes that came with PC1 and PC2 or contributing to to the first two Principal component

I have created a new question and the paper link also cited in the question

Extracting features or gene from PCA after calculating PCA for downstream analysis

You can see and let me know .