I'm doing PCA (principal Component Analysis) on a set of 1000 genes in 4 different samples to see if there's any split in the data. My data looks like this:
My code is very simple:
data<-read.csv("exp.csv") matrix<-data.matrix(data) pca<- prcomp(matrix[,2:4], scale.=T) library(ggplot2) # create data frame with scores scores = as.data.frame(pca$x) # plot of observations ggplot(data = scores, aes(x = PC1, y = PC2, label = rownames(exp))) + geom_hline(yintercept = 0, colour = "gray65") + geom_vline(xintercept = 0, colour = "gray65") + geom_text(colour = "tomato", alpha = 0.8, size = 4) + ggtitle("PCA plot")
When I plot PC1 and PC2 I clearly see a separation so the genes are divided into 2 big groups but how can I see what the constituent genes of these 2 clusters are? because in the plot lots of genes overlap with each other and therefore its difficult to make out the gene names just from the plot. How can I extract these from PCA results and save it as a text file?
EDIT: For the above code, can someone tell me as to how I can colour the dots in the plot according to the sample? I tried changing colour parameter in ggplot but its not working.