Question: scaling difference between prcomp() and pcatools
1
gravatar for le.vla
3 months ago by
le.vla10
Amsterdam
le.vla10 wrote:

Hello, I have the following question: I have an RNAseq dataset (DESeq dataset) that I want to do PCA analysis on. This is how I made the dataset:

dds_norm <- estimateSizeFactors(dds)
dds_norm_vst <- vst(dds_norm)
counts_norm_vst <- assay(dds_norm_vst, normalized = TRUE)

I made a PCA bi-plot, screeplot and pairsplot with the package PCAtools, with the following code:

dds_vst_pca_.1 <- pca(counts_norm_vst, metadata = xp_design_pca, removeVar = .1)
    #make scree plot (90% top genes)
scree_all_.1 <- data.frame(percentVar_all_.1)
scree_all_.1[,2] <- c(1:46)
colnames(scree_all_.1) <- c("variance","component_nr")
ggplot(scree_all_.1, mapping=aes(x=component_nr, y=variance)) + 
  geom_bar(stat="identity") +
  ggtitle("screeplot for all timepoints")


#make bi plot (90% top) with PCAtools
biplot(dds_vst_pca_.1,
       colby = "time",
       shape = "treatment",
       pointSize = 5,
       legendPosition = 'right')

#find PC's to keep
elbow <- findElbowPoint(dds_vst_pca_.1$variance)

#make pairs plot (90% top)
pairsplot(dds_vst_pca_.1,
          components = getComponents(dds_vst_pca_.1, c(1:10)),
          colby = "treatment",
          colkey = c("e" = "gray",
                     "m" = "darkgreen",
                     "n" = "lightgreen"))

Then, I tried to check myself and made another biplot with prcomp() according to the following code:

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)
ntop_.1 <- nrow(counts)*.9
ntop_.1 <- as.integer(ntop_.1)
#do prcomp with lower 10% of variable genes removed
select_.1 <- order(rv, decreasing = TRUE)[seq_len(min(ntop_.1, length(rv)))]
mat_.1 <- t( assay(dds_norm_vst)[select_.1, ] )
dds_norm_vst_prc_.1 <-prcomp(mat_.1)
dds_norm_vst_prc_.1_df <- as.data.frame(dds_norm_vst_prc_.1$x)
percentVar_all_.1 <- dds_norm_vst_prc_.1$sdev^2 / sum(dds_norm_vst_prc_.1$sdev^2)
ggplot(dds_norm_vst_prc_.1_df, aes(PC1, PC2, color = time, shape = treatment)) +
  geom_point(size = 4) +
  xlab(paste0("PC1: ",percentVar_all_.1[1]*100,"% variance")) +
  ylab(paste0("PC2: ",percentVar_all_.1[2]*100,"% variance")) + 
  coord_fixed() +
  ggtitle("PCA plot")

Indeed, these two give exactly the same biplot (same percentages of PC's and same position of the samples). However, then I looked into scaling, and I see that although for the prcomp() I put scaling to TRUE, for the pcatools method i did not specify, assuming scaling default would be TRUE. However, I found out the default is FALSE! Indeed, when I put scaling to TRUE, I get a different outcome. How is this possible? What am I overlooking? Thanks very much for any input!

rna-seq R • 228 views
ADD COMMENTlink modified 6 weeks ago by Biostar ♦♦ 20 • written 3 months ago by le.vla10
1
gravatar for ATpoint
3 months ago by
ATpoint44k
ATpoint44k wrote:

dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE)

Is counts_norm_vst transposed here? mat_.1 seems to be, is the inconsistency based on missing transposition?

ADD COMMENTlink written 3 months ago by ATpoint44k

No you are right, counts_norm_vst was not transposed... But now I'm even more confused! Because if I transpose it and then run prcomp, it gives me this error message: Error in prcomp.default(counts_norm_vst_t, scale = TRUE) : cannot rescale a constant/zero column to unit variance So now these questions are added: 1. how can I have run a non-transposed object in prcomp? (I didn't get any error messages before) 2. Can this influence on scaling?

ADD REPLYlink written 3 months ago by le.vla10
1

The error means that there is a column with all values being the same. Look, your code is a bit...unorganized, why don't you define counts_norm_vst on top, then subset to a common set of genes and then feed this matrix to both methods, turning off any feature selection in PCAtools to have things consistent between methods. Right now it seems as you are feeding incorrect inputs which confounds your results.

ADD REPLYlink written 3 months ago by ATpoint44k

Yes, I will do that. Thanks for your help!

ADD REPLYlink written 3 months ago by le.vla10

Ah wow..... I see I've been completely blind for mat_.1. That is the prcomp I use and it is not scaled. I don't use dds_vst_prc at all in the downstream code. Problem solved! Still, I would like to know how come I can run a non-transposed item in prcomp with no error, and then when I transpose, it gives the above-mentioned error.

ADD REPLYlink written 3 months ago by le.vla10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2580 users visited in the last hour
_