Question

scaling difference between prcomp() and pcatools

2

Entering edit mode

3.5 years ago

le.vla ▴ 20

Hello, I have the following question: I have an RNAseq dataset (DESeq dataset) that I want to do PCA analysis on. This is how I made the dataset:

dds_norm <- estimateSizeFactors(dds)
dds_norm_vst <- vst(dds_norm)
counts_norm_vst <- assay(dds_norm_vst, normalized = TRUE)

I made a PCA bi-plot, screeplot and pairsplot with the package PCAtools, with the following code:

dds_vst_pca_.1 <- pca(counts_norm_vst, metadata = xp_design_pca, removeVar = .1)
    #make scree plot (90% top genes)
scree_all_.1 <- data.frame(percentVar_all_.1)
scree_all_.1[,2] <- c(1:46)
colnames(scree_all_.1) <- c("variance","component_nr")
ggplot(scree_all_.1, mapping=aes(x=component_nr, y=variance)) + 
  geom_bar(stat="identity") +
  ggtitle("screeplot for all timepoints")


#make bi plot (90% top) with PCAtools
biplot(dds_vst_pca_.1,
       colby = "time",
       shape = "treatment",
       pointSize = 5,
       legendPosition = 'right')

#find PC's to keep
elbow <- findElbowPoint(dds_vst_pca_.1$variance)

#make pairs plot (90% top)
pairsplot(dds_vst_pca_.1,
          components = getComponents(dds_vst_pca_.1, c(1:10)),
          colby = "treatment",
          colkey = c("e" = "gray",
                     "m" = "darkgreen",
                     "n" = "lightgreen"))

Then, I tried to check myself and made another biplot with prcomp() according to the following code:

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)
ntop_.1 <- nrow(counts)*.9
ntop_.1 <- as.integer(ntop_.1)
#do prcomp with lower 10% of variable genes removed
select_.1 <- order(rv, decreasing = TRUE)[seq_len(min(ntop_.1, length(rv)))]
mat_.1 <- t( assay(dds_norm_vst)[select_.1, ] )
dds_norm_vst_prc_.1 <-prcomp(mat_.1)
dds_norm_vst_prc_.1_df <- as.data.frame(dds_norm_vst_prc_.1$x)
percentVar_all_.1 <- dds_norm_vst_prc_.1$sdev^2 / sum(dds_norm_vst_prc_.1$sdev^2)
ggplot(dds_norm_vst_prc_.1_df, aes(PC1, PC2, color = time, shape = treatment)) +
  geom_point(size = 4) +
  xlab(paste0("PC1: ",percentVar_all_.1[1]*100,"% variance")) +
  ylab(paste0("PC2: ",percentVar_all_.1[2]*100,"% variance")) + 
  coord_fixed() +
  ggtitle("PCA plot")

Indeed, these two give exactly the same biplot (same percentages of PC's and same position of the samples). However, then I looked into scaling, and I see that although for the prcomp() I put scaling to TRUE, for the pcatools method i did not specify, assuming scaling default would be TRUE. However, I found out the default is FALSE! Indeed, when I put scaling to TRUE, I get a different outcome. How is this possible? What am I overlooking? Thanks very much for any input!

RNA-Seq R • 1.4k views

ADD COMMENT • link updated 3.4 years ago by Biostar 20 • written 3.5 years ago by le.vla ▴ 20

score 1 · Accepted Answer · 2020-10-20

1

Entering edit mode

3.5 years ago

ATpoint 82k

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)

Is counts_norm_vst transposed here? mat_.1 seems to be, is the inconsistency based on missing transposition?

ADD COMMENT • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

No you are right, counts_norm_vst was not transposed... But now I'm even more confused! Because if I transpose it and then run prcomp, it gives me this error message: Error in prcomp.default(counts_norm_vst_t, scale = TRUE) : cannot rescale a constant/zero column to unit variance So now these questions are added: 1. how can I have run a non-transposed object in prcomp? (I didn't get any error messages before) 2. Can this influence on scaling?

ADD REPLY • link 3.5 years ago by le.vla ▴ 20

1

Entering edit mode

The error means that there is a column with all values being the same. Look, your code is a bit...unorganized, why don't you define counts_norm_vst on top, then subset to a common set of genes and then feed this matrix to both methods, turning off any feature selection in PCAtools to have things consistent between methods. Right now it seems as you are feeding incorrect inputs which confounds your results.

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

Yes, I will do that. Thanks for your help!

ADD REPLY • link 3.5 years ago by le.vla ▴ 20

0

Entering edit mode

Ah wow..... I see I've been completely blind for mat_.1. That is the prcomp I use and it is not scaled. I don't use dds_vst_prc at all in the downstream code. Problem solved! Still, I would like to know how come I can run a non-transposed item in prcomp with no error, and then when I transpose, it gives the above-mentioned error.

ADD REPLY • link 3.5 years ago by le.vla ▴ 20