pca analysis for differential gene expression data of microarray samples of similar condition
5
2
Entering edit mode
4.5 years ago

i have the genes in rows and the sample names in the columns and number samples are 76 and number of genes are 376. i got these genes after differential gene expression of different biotic and abiotic stress conditions, i want to do a PCA analysis in R and biplot graph for my data. can any one help ?

RNA-Seq next-gen R sequencing genome • 9.2k views
ADD COMMENT
0
Entering edit mode

Dear rajasekargutha, Hi.

There is a PCA performing script using PtR at the bottom of this page and this post.

~ Best

ADD REPLY
0
Entering edit mode
ADD REPLY
6
Entering edit mode

Personally I don't recommend 3D PCA plot like that in your referred paper. It can be sometimes confusing to interpret, see an example here(start at 15:30). Plot PC1 vs PC2, PC2 vs PC3 would be much clearer to see the pattern.

ADD REPLY
0
Entering edit mode

Completely agree! If the plot is interactive and you can rotate the axis then 3d plots can be somewhat useful to understand the structure of your data (although still not so easy- although I guess it dependes on the data). But 2d snapshots of a 3d plot can be very misleading. Thanks for the very useful link to R. Irizarry's talk.

ADD REPLY
0
Entering edit mode

I would recomment do perform a Multidimensional plot instead of a PCA, See cmdscale in R help.

ADD REPLY
3
Entering edit mode
4.5 years ago
ddiez ★ 1.9k

In R you can use the function prcomp() (available by default) on your matrix. Then you can use biplot() on the result to obtain a biplot (read the documentation about biplot with ?biplot as there are different kinds of plots that are known as biplot). Another alternative is to install the pcaMethods Bioconductor package. A small example with prcomp():

x <- data.matrix(iris[,-5]) # prcomp() requires a numeric matrix.
p <- prcomp(x)
p
Standard deviations:
[1] 2.0562689 0.4926162 0.2796596 0.1543862

Rotation:
                     PC1         PC2         PC3        PC4
Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574

biplot(p)
ADD COMMENT
0
Entering edit mode

Dear ddiez, Hi

it seems that the images in the example @rajasekargutha has provided above are in 3D.

Does this biplot() can produce such colorful 3D plots?

Thanks

ADD REPLY
1
Entering edit mode

Not that function AFAIK. However, take a look at the answer to this question in SO. Also a quick search points to an R package called pca3d, which uses rgl for 3D pca plots with interactivity.

ADD REPLY
3
Entering edit mode
4.5 years ago
Ron ★ 1.0k

Perhaps, this one also provides nice visualization based on ggplot

https://github.com/vqv/ggbiplot

p <- prcomp(x)

g <- ggbiplot(p, scale = 1,obs.scale = 1, varname.abbrev = FALSE,var.axes = FALSE,pc.biplot =TRUE,circle = TRUE)
ADD COMMENT
0
Entering edit mode

Ooh, a ggplot version. This is great!

ADD REPLY
2
Entering edit mode
4.5 years ago
ivivek_ngs ★ 5.1k

First, you have to be clear , what you want to see. PCA on entire samples means based on gene variability you see them clustered in 2 different groups. This marks the difference between the conditions or the groups you study. If you have already found your genes that are DEGs, it is advisable to use them as a volcano plot or MAplot in r to capture their difference or even a heatmap with some wonderful r packages.

It is not very much advisable to make PCA on the DEGs, better to make a heatmap on them. But if you are hell bent on doing a PCA then MDSplot from limma or prcomp or princomp will also suffice. But ideally, what you want to convey is based on variability of gene expression between 2 conditions you have come up with the highest variable genes that separate them in 2 clusters thus giving different phenotypes. This is fairly simple. You take all samples, perform PCA on all samples vs all genes, you see they have 2 clusters and samples show variability, so down stream of it you perform DE analysis to find those genes. This can be seen either in MAplot or volcano plot or a heat map. PCA for such a small number of samples and genes is not appreciated. I would bet that in this case, your PCA should be on genes rather than samples. So points you will project in the PC should be the genes separated by 2 conditions of your samples.

Take a look at this link

ADD COMMENT
0
Entering edit mode

Dear vchris_ngs, Hi

I am agree with you as I think it would be clear that the DEGs of separate conditions, show separate cluster in PCA!

ADD REPLY
0
Entering edit mode

thanks, that was super helpful! :)

ADD REPLY
0
Entering edit mode
4.5 years ago
Ron ★ 1.0k

This is for 3d PCA interactive plots using "rgl" library in R

http://planspace.org/2013/02/03/pca-3d-visualization-and-clustering-in-r/

ADD COMMENT

Login before adding your answer.

Traffic: 2500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6