*NB - this is now a Bioconductor R package: https://github.com/kevinblighe/PCAtools*

## -------------------------

You should normalise your data prior to performing PCA. In the code below, you'll have to add plot legends yourself, and also colour vectors (passed to the '*col*' parameter).

Then, assuming that you have transcripts as rows and samples as columns:

*NB - in this code, the plots I've shown don't necessarily match the exact code, but the plot type is the same*

**[Edit: also take a look at my definition of PCA here: PCA in a RNA seq analysis]**

# Perform PCA / single value decomposition

```
project.pca <- prcomp(t(MyReadCountMatrix))
summary(project.pca)
#Determine the proportion of variance of each component
#Proportion of variance equals (PC stdev^2) / (sum all PCs stdev^2)
project.pca.proportionvariances <- ((project.pca$sdev^2) / (sum(project.pca$sdev^2)))*100
```

# Scree plot

```
barplot(project.pca.proportionvariances, cex.names=1, xlab=paste("Principal component (PC), 1-", length(project.pca$sdev)), ylab="Proportion of variation (%)", main="Scree plot", ylim=c(0,100))
```

# Pairs plots

```
par(cex=1.0, cex.axis=0.8, cex.main=0.8)
pairs(project.pca$x[,1:5], col="black", main="Principal components analysis bi-plot\nPCs 1-5", pch=16)
pairs(project.pca$x[,6:10], col="black", main="Principal components analysis bi-plot\nPCs 6-10", pch=16)
```

# Bi-plots

```
par(mar=c(4,4,4,4), mfrow=c(1,3), cex=1.0, cex.main=0.8, cex.axis=0.8)
#Plots scatter plot for PC 1 and 2
plot(project.pca$x, type="n", main="Principal components analysis bi-plot", xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"))
points(project.pca$x, col="black", pch=16, cex=1)
#Plots scatter plot for PC 1 and 3
plot(project.pca$x[,1], project.pca$x[,3], type="n", main="Principal components analysis bi-plot", xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"))
points(project.pca$x[,1], project.pca$x[,3], col="black", pch=16, cex=1)
#Plots scatter plot for PC 2 and 3
plot(project.pca$x[,2], project.pca$x[,3], type="n", main="Principal components analysis bi-plot", xlab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"), ylab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"))
points(project.pca$x[,2], project.pca$x[,3], col="black", pch=16, cex=1)
```

# Tri-plot

```
require(scatterplot3d)
par(mar=c(4,4,4,4), cex=1.0, cex.main=0.8, cex.axis=0.8)
scatterplot3d(project.pca$x[,1:3], angle=-40, main="", color="black", pch=17, xlab=paste("PC1, ", round(project.pca.proportionvariances[1], 2), "%"), ylab=paste("PC2, ", round(project.pca.proportionvariances[2], 2), "%"), zlab=paste("PC3, ", round(project.pca.proportionvariances[3], 2), "%"), grid=FALSE, box=FALSE)
source('http://www.sthda.com/sthda/RDoc/functions/addgrids3d.r')
addgrids3d(project.pca$x[,1:3], grid = c("xy", "xz", "yz"))
source('http://www.sthda.com/sthda/RDoc/functions/addgrids3d.r')
addgrids3d(project.pca$x[,1:3], grid = c("xy", "xz", "yz"))
```

Hi Kevin, Thanks! Can you please tell me how can I color two groups, one in green and other in red?

30Hey, this is now a Bioconductor package. So, you should be able to colour the groups easily by using that: https://github.com/kevinblighe/PCAtools ?

70kHi Kevin, Thanks for your reply. I can't install PCAtools in R: it download yet when I open the package it says not found. I have 46 colums with Leaf and fruit samples and rows are genes. I don't know how to plot PCA and color Leaf and fruit. Is there a simple example I can understand with. Sorry I am beginner and trying to learn. Thanks

30Can you show the exact commands that you used to try to run

PCAtools, and the exact output error messages?If you also paste a sample of your data and metadata, then I can walk you through it.

70kHi Kevin, Thanks so much for you kind help! I installed and types following:

My data files is as follows: Where: Column 1 is tissue types and Row1 is genes. I have 46 tissue type and 1000 genes. I want to draw a PCA plot showing all leaf samples in green and all fruit samples in red!!

30Solyc00g005000 Solyc00g005040 Solyc00g005050 Solyc00g005060 Solyc00g005080 Solyc00g005090

Leaf1 0.00000000 2.97000000 18.58000 0 0.00000000 0

Leaf2 0.00000000 0.00000000 20.83000 0 0.00000000 0

Leaf3 0.00000000 0.25000000 12.20000 0 0.00000000 0

Leaf4 0.00000000 0.00000000 7.62000 0 0.00000000 0

Leaf5 0.00000000 0.00000000 18.00000 0 0.00000000 0

Leaf6 0.00000000 1.33000000 21.58000 0 0.00000000 0

Fruit1 0.00000000 0.00000000 13.12667 0 0.00000000 0

Fruit2 0.00000000 0.24666667 8.64000 0 0.00000000 0

Fruit3 0.00000000 0.06666667 10.58000 0 0.00000000 0

Fruit4 0.00000000 0.70666667 37.30000 0 0.00000000 0

Fruit5 0.00000000 0.02666667 45.96667 0 0.00000000 0

Fruit6 0.00000000 0.08333333 24.34333 0 0.00000000 0

Fruit7 0.00000000 0.28000000 20.16333 0 0.00000000 0

Fruit8 0.00000000 0.34333333 27.76667 0 0.00000000 0

Fruit9 0.00000000 0.15333333 15.42667 0 0.03666667 0

Fruit10 0.00000000 0.10666667 12.86667 0 0.00000000 0

Fruit11 0.00000000 0.25666667 16.36000 0 0.03666667 0

Fruit12 0.00000000 1.03333333 40.36000 0 0.04333333 0

Fruit13 0.00000000 0.58000000 57.46000 0 0.00000000 0

Fruit14 0.05333333 0.16333333 29.14333 0 0.00000000 0

Fruit15 0.02666667 0.06000000 26.37667 0 0.00000000 0

Fruit16 0.06000000 0.18000000 49.53000 0 0.00000000 0

30Ah, I see, you will require R 3.6 for PCAtools - sorry. For now, you can still use base R functions. Here is an example with the data that you have given:

## View data

## Create colour vector and shape vector for leaf / fruit

## Perform PCA and plot PC1 vs. PC2

70kHi Kevin, Thanks so much! Don't have word to thank you!

30