scaling difference between prcomp() and pcatools
1
2
Entering edit mode
3.5 years ago
le.vla ▴ 20

Hello, I have the following question: I have an RNAseq dataset (DESeq dataset) that I want to do PCA analysis on. This is how I made the dataset:

dds_norm <- estimateSizeFactors(dds)
dds_norm_vst <- vst(dds_norm)
counts_norm_vst <- assay(dds_norm_vst, normalized = TRUE)

I made a PCA bi-plot, screeplot and pairsplot with the package PCAtools, with the following code:

dds_vst_pca_.1 <- pca(counts_norm_vst, metadata = xp_design_pca, removeVar = .1)
    #make scree plot (90% top genes)
scree_all_.1 <- data.frame(percentVar_all_.1)
scree_all_.1[,2] <- c(1:46)
colnames(scree_all_.1) <- c("variance","component_nr")
ggplot(scree_all_.1, mapping=aes(x=component_nr, y=variance)) + 
  geom_bar(stat="identity") +
  ggtitle("screeplot for all timepoints")


#make bi plot (90% top) with PCAtools
biplot(dds_vst_pca_.1,
       colby = "time",
       shape = "treatment",
       pointSize = 5,
       legendPosition = 'right')

#find PC's to keep
elbow <- findElbowPoint(dds_vst_pca_.1$variance)

#make pairs plot (90% top)
pairsplot(dds_vst_pca_.1,
          components = getComponents(dds_vst_pca_.1, c(1:10)),
          colby = "treatment",
          colkey = c("e" = "gray",
                     "m" = "darkgreen",
                     "n" = "lightgreen"))

Then, I tried to check myself and made another biplot with prcomp() according to the following code:

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)
ntop_.1 <- nrow(counts)*.9
ntop_.1 <- as.integer(ntop_.1)
#do prcomp with lower 10% of variable genes removed
select_.1 <- order(rv, decreasing = TRUE)[seq_len(min(ntop_.1, length(rv)))]
mat_.1 <- t( assay(dds_norm_vst)[select_.1, ] )
dds_norm_vst_prc_.1 <-prcomp(mat_.1)
dds_norm_vst_prc_.1_df <- as.data.frame(dds_norm_vst_prc_.1$x)
percentVar_all_.1 <- dds_norm_vst_prc_.1$sdev^2 / sum(dds_norm_vst_prc_.1$sdev^2)
ggplot(dds_norm_vst_prc_.1_df, aes(PC1, PC2, color = time, shape = treatment)) +
  geom_point(size = 4) +
  xlab(paste0("PC1: ",percentVar_all_.1[1]*100,"% variance")) +
  ylab(paste0("PC2: ",percentVar_all_.1[2]*100,"% variance")) + 
  coord_fixed() +
  ggtitle("PCA plot")

Indeed, these two give exactly the same biplot (same percentages of PC's and same position of the samples). However, then I looked into scaling, and I see that although for the prcomp() I put scaling to TRUE, for the pcatools method i did not specify, assuming scaling default would be TRUE. However, I found out the default is FALSE! Indeed, when I put scaling to TRUE, I get a different outcome. How is this possible? What am I overlooking? Thanks very much for any input!

RNA-Seq R • 1.4k views
ADD COMMENT
1
Entering edit mode
3.5 years ago
ATpoint 82k

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)

Is counts_norm_vst transposed here? mat_.1 seems to be, is the inconsistency based on missing transposition?

ADD COMMENT
0
Entering edit mode

No you are right, counts_norm_vst was not transposed... But now I'm even more confused! Because if I transpose it and then run prcomp, it gives me this error message: Error in prcomp.default(counts_norm_vst_t, scale = TRUE) : cannot rescale a constant/zero column to unit variance So now these questions are added: 1. how can I have run a non-transposed object in prcomp? (I didn't get any error messages before) 2. Can this influence on scaling?

ADD REPLY
1
Entering edit mode

The error means that there is a column with all values being the same. Look, your code is a bit...unorganized, why don't you define counts_norm_vst on top, then subset to a common set of genes and then feed this matrix to both methods, turning off any feature selection in PCAtools to have things consistent between methods. Right now it seems as you are feeding incorrect inputs which confounds your results.

ADD REPLY
0
Entering edit mode

Yes, I will do that. Thanks for your help!

ADD REPLY
0
Entering edit mode

Ah wow..... I see I've been completely blind for mat_.1. That is the prcomp I use and it is not scaled. I don't use dds_vst_prc at all in the downstream code. Problem solved! Still, I would like to know how come I can run a non-transposed item in prcomp with no error, and then when I transpose, it gives the above-mentioned error.

ADD REPLY

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6