0
0
Entering edit mode
21 days ago
camillab. ▴ 40

Hi,

I want to find the variable contribution (loadings) after I compute my PCA but I noticed that depending on which package I used they change. How come? which one should I use? Is something wrong with my code?

#tidy dataset
test1bis <- na.omit(test1)
test1bis <-  test1bis[,-(9:12)] #leave only FPKMs
rnames <- test1bis$Associated.Gene.Name #generate a matrix with rownames test1bis <- test1bis[,-(1:3)] #remove column name data2 <- as.matrix(test1bis) #generate a matrix with only samples rownames(data2) <- rnames #associate rownames to the matrix data2 <- log(data2 + 1) #log transform test<-(t(data2)) #transpose  then I use FactoMineR : library("FactoMineR") # for computing PCA library("factoextra") library("ggplot2") #compute PCA res.pca <- PCA(test, graph = FALSE, scale.unit = F)  to identify loading, since it's not possible to extract directly from factomineR as mentioned here I did: #loading PC1 PC2 loadings<-sweep(res.pca$var$coord,2,sqrt(res.pca$eig[1:ncol(res.pca$var$coord),1]),FUN="/")


and I obtained:

loading <- as.data.frame(loadings)

Dim.1        Dim.2         Dim.3
TSPAN6   -3.305174e-03 -0.006617528  0.0124388641
DPM1      4.520584e-03 -0.003854473 -0.0082734239
SCYL3     9.579238e-03 -0.004509442 -0.0007637849
C1orf112  8.683544e-03 -0.001069403 -0.0073240322
FGR      -9.173168e-05  0.011989805  0.0032702963
Dim.4
TSPAN6   -0.007109117
DPM1      0.015185883
SCYL3    -0.003513365
C1orf112  0.007451701
FGR      -0.005777467


But if I use prcomp function on the same dataset I get different output for my loadings:

res.pc2a <- prcomp(test, center=T, scale=F)

load <- res.pc2a$rotation head(load,5) PC1 PC2 PC3 TSPAN6 1.029975e-03 0.0025505542 0.0054742962 DPM1 -8.932191e-04 0.0010277374 -0.0022086783 SCYL3 -4.946089e-03 0.0028874002 -0.0005080739 C1orf112 -1.798157e-03 0.0003073623 -0.0020600735 FGR 5.811615e-05 -0.0130233750 0.0035801752 PC4 PC5 TSPAN6 -0.003903774 0.6857941304 DPM1 0.004895724 0.5280178531 SCYL3 -0.002546638 0.4072318978 C1orf112 0.002616394 -0.2555861099 FGR -0.007457418 -0.0009055744  And if use PCAtool : library(tidyr) #does not accept duplicate name test1bis <- na.omit(test1) test1bis <- test1bis[,-(9:12)] #leave only FPKMs rnames <-test1bis$ID
test1bis <- test1bis[,-(1:3)] #remove column name
data2 <- as.matrix(test1bis) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
#assign column name for the ID of samples
idx <- colnames(test1bis)

#PCA
library(PCAtools)
p <- pca(data2, scale=F)



(The IDs are the same of gene name shown before ):

                               PC1           PC2
ENSG00000000003  1.037372e-03  0.0025831931
ENSG00000000419 -8.992832e-04  0.0010416784
ENSG00000000457 -4.994887e-03  0.0029404936
ENSG00000000460 -1.814579e-03  0.0003148908
ENSG00000000938  4.281306e-05 -0.0132009120
PC3          PC4
ENSG00000000003  0.0054934570 -0.004052016
ENSG00000000419 -0.0022215836  0.005110726
ENSG00000000457 -0.0005286536 -0.002651556
ENSG00000000460 -0.0020754794  0.002727814
ENSG00000000938  0.0035755992 -0.007823109
PC5
ENSG00000000003  0.7868385062
ENSG00000000419 -0.2839137164
ENSG00000000457 -0.3205780708
ENSG00000000460  0.4334467281
ENSG00000000938 -0.0002801099


Thank you very much!

Camilla

2
Entering edit mode

I checked the parameters of the 3 methods and look like the PCA does not have center=TRUE option. In prcomp you centered it on your command line and in pca from PCAtools this is a default parameter.

That may be a start.

https://www.rdocumentation.org/packages/FactoMineR/versions/2.4/topics/PCA

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp

https://rdrr.io/bioc/PCAtools/man/pca.html

And I guess the slight differences in digits you have between prcomp and pca loadings is due to the Principal Componant calculation in both packages.

0
Entering edit mode

Thank you, I wrote an email to the developer of FactorMineR long time ago and he said that "There is no way to use the PCA function without centring data" so not sure what could the the problem!

1
Entering edit mode

I did not see it at first, but the Loadings of the first PC are in e-n which make the loading values super small. These genes are not the top loadings of each PCs. And the small differences that one can see might come from the inner calculation of the principal component of each tool.

0
Entering edit mode

I will reorder, also I did tried again without order the loadings and this time (I re-run the same script with FactorMineR but this time I get the same values but opposite sign.

                 Dim.1         Dim.2         Dim.3        Dim.4
TSPAN6   -1.037372e-03 -0.0025831931  0.0054934570 -0.004052016
DPM1      8.992832e-04 -0.0010416784 -0.0022215836  0.005110726
SCYL3     4.994887e-03 -0.0029404936 -0.0005286536 -0.002651556
C1orf112  1.814579e-03 -0.0003148908 -0.0020754794  0.002727814
FGR      -4.281306e-05  0.0132009120  0.0035755992 -0.007823109


Why that?

0
Entering edit mode

That is pretty hard to say without a proper example dataset and command lines. Can you redo your analysis with mtcars or iris please, so everyone can redo it at home and experiment to find some solutions.

0
Entering edit mode

Just for clarity: PCAtools performs PCA via BiocSingular::runPCA() - here is the function call in the code: https://github.com/kevinblighe/PCAtools/blob/master/R/pca.R#L153-L157

I am sure that we checked this previously and found that PCAtools gives the same output as prcomp().

Please do ensure that center and scale are the same

0
Entering edit mode

Apologies, I have missed the comments for some reason. PCAtools gives me the same results as prcomp, I was wondering why factomineR wasn`t matching with the other two packages

0
Entering edit mode

Cool, well, I had hoped that the factomineR developer would have given a better response other than "There is no way to use the PCA function without centring data" It must be doing some other filtering or transformation.