i have the genes in rows and the sample names in the columns and number samples are 76 and number of genes are 376. i got these genes after differential gene expression of different biotic and abiotic stress conditions, i want to do a PCA analysis in R and biplot graph for my data. can any one help ?
Perhaps, this one also provides nice visualization based on ggplot
p <- prcomp(x) g <- ggbiplot(p, scale = 1,obs.scale = 1, varname.abbrev = FALSE,var.axes = FALSE,pc.biplot =TRUE,circle = TRUE)
In R you can use the function
prcomp() (available by default) on your matrix. Then you can use
biplot() on the result to obtain a biplot (read the documentation about biplot with
?biplot as there are different kinds of plots that are known as biplot). Another alternative is to install the pcaMethods Bioconductor package. A small example with
x <- data.matrix(iris[,-5]) # prcomp() requires a numeric matrix. p <- prcomp(x) p Standard deviations:  2.0562689 0.4926162 0.2796596 0.1543862 Rotation: PC1 PC2 PC3 PC4 Sepal.Length 0.36138659 -0.65658877 0.58202985 0.3154872 Sepal.Width -0.08452251 -0.73016143 -0.59791083 -0.3197231 Petal.Length 0.85667061 0.17337266 -0.07623608 -0.4798390 Petal.Width 0.35828920 0.07548102 -0.54583143 0.7536574 biplot(p)
First, you have to be clear , what you want to see. PCA on entire samples means based on gene variability you see them clustered in 2 different groups. This marks the difference between the conditions or the groups you study. If you have already found your genes that are DEGs, it is advisable to use them as a volcano plot or MAplot in r to capture their difference or even a heatmap with some wonderful r packages.
It is not very much advisable to make PCA on the DEGs, better to make a heatmap on them. But if you are hell bent on doing a PCA then MDSplot from limma or prcomp or princomp will also suffice. But ideally, what you want to convey is based on variability of gene expression between 2 conditions you have come up with the highest variable genes that separate them in 2 clusters thus giving different phenotypes. This is fairly simple. You take all samples, perform PCA on all samples vs all genes, you see they have 2 clusters and samples show variability, so down stream of it you perform DE analysis to find those genes. This can be seen either in MAplot or volcano plot or a heat map. PCA for such a small number of samples and genes is not appreciated. I would bet that in this case, your PCA should be on genes rather than samples. So points you will project in the PC should be the genes separated by 2 conditions of your samples.
Take a look at this link
This is for 3d PCA interactive plots using "rgl" library in R