I'm trying to do a PCA analysis for a time course protein expression study and for three time points. I wish to find out proteins that best represents the time points.In other words I wish to get a list of proteins which appear in different "sectors" of time as given in this example plot/link.
I used the following R script for doing the PCA and generate the Biplot.
rm(list=ls()) my.df <- read.table("expression_log2ratio.txt",row.names=1,header=TRUE,sep="\t",check.names=FALSE) prot.pca <- prcomp(na.omit(my.df), scale=FALSE) summary(prot.pca) biplot(prot.pca,col=c("blue","red"),cex=c(0.5,0.5) )
My input data is of the folllowing format with around 4000 proteins. The numerical values are that of log2ratios
T1 T2 T3 p1 -0.071396303 0.006385917 0.088535769 p2 -0.115839104 0.043409057 -0.035812972 p3 -0.01593602 -0.02627361 0.014833213 .....
My Biplot looks as follows:
Can anybody suggest how to extract the proteins in different 'sectors' using R ? Any help is highly appreciated.