I am computing PCA with
prcomp and I wanted to extract the loadings/rotations for PC1.
My dataset (the dataset has been converted into Z scores):
# A tibble: 5 x 38 gene P4_MU_Sepithelia P16_MU_Sepithel~ <chr> <dbl> <dbl> 1 ADAM10 1.45 0.973 2 ADAM12 -1.43 -1.62 3 ADAM17 -0.0191 -0.0544 4 APH1A 0.866 1.04 5 ATOH1 0.162 -0.545
my script for the PCA:
SE_notch3 <- SE_notch %>% drop_na() #remove rows with NA rnames <- SE_notch3$gene #select character variable (aka gene names) SE_notch3 <- SE_notch3[-c(1)] # remove gene symbol SE_notch.matrix <-(as.matrix(SE_notch3)) #convert into matrix rownames(SE_notch.matrix) <- rnames # assign row names all <- t(SE_notch.matrix) #transpose #pca res.pca2 <- prcomp(all, center = T, scale=F)
then I tried two different approaches to get the loadings (which in prcomp resides into the rotation parameter):
loading.scores <- res.pca2$rotation[,1] #PC1 genes.scores <- abs(loading.scores) #convert in absolute values tes <- sort(genes.scores, decreasing=T) #decreasing orders top <- tes[1:10] #show the first 10 genes top
the 10 top gene responsible to drive my PC1 according to the previous code are:
HES7 HES1 HEY2 HES6 GATA3 0.99819716 0.02058775 0.01975547 0.01965922 0.01623756 ADAM12 HEYL DLL4 NOTCH1 ATOH1 0.01535992 0.01517099 0.01410263 0.01406065 0.01274545
but if I compute the PCA on those 10 genes, instead of seeing the sample to cluster far apart they cluster ALL together so I thought I wort the wrong code and I tried this method:
library(broom) library(tidyr) top_genes <- res.pca2 %>% # extract variable (gene) loadings tidy(matrix = "variables") %>% filter(PC %in% c(1)) %>% # retain only PC1 group_by(PC) %>% arrange(desc(abs(value))) %>% # sort descending value slice(1:10) %>% # take top 10 pull(column) %>% # extract the column (gene name) from the table unique() # retain unique gene names only top_genes
and the top genes are:
 "PSEN1" "JAG2" "CREBBP" "DTX2" "MAML1"  "LNX1" "NCSTN" "ADAM12" "YY1" "ITCH"
What is the right method? Why do I get different output if the PCA is computed in the same way? Where is the error?
I have also tried to compute the PCA with
FactoMineR package that doesn`t show me the loadings directly but it gives me the plot of the contribution when I set which Pc I want:
library("FactoMineR") # for computing PCA library("factoextra") # for graph #PCA res.pca <- PCA(all, graph = FALSE, scale.unit = FALSE) #contribution variables top10<- fviz_contrib(res.pca, choice = "var", axes = 1, top = 10) top10
and the genes with most contributions to PC1 are those that I found with the "first" code :
HES7 HES1 HEY2 HES6 GATA3 ADAM12 HEYL DLL4 NOTCH1 ATOH1
Thank you for your help!