Hello, I have the following question: I have an RNAseq dataset (DESeq dataset) that I want to do PCA analysis on. This is how I made the dataset:
dds_norm <- estimateSizeFactors(dds) dds_norm_vst <- vst(dds_norm) counts_norm_vst <- assay(dds_norm_vst, normalized = TRUE)
I made a PCA bi-plot, screeplot and pairsplot with the package PCAtools, with the following code:
dds_vst_pca_.1 <- pca(counts_norm_vst, metadata = xp_design_pca, removeVar = .1) #make scree plot (90% top genes) scree_all_.1 <- data.frame(percentVar_all_.1) scree_all_.1[,2] <- c(1:46) colnames(scree_all_.1) <- c("variance","component_nr") ggplot(scree_all_.1, mapping=aes(x=component_nr, y=variance)) + geom_bar(stat="identity") + ggtitle("screeplot for all timepoints") #make bi plot (90% top) with PCAtools biplot(dds_vst_pca_.1, colby = "time", shape = "treatment", pointSize = 5, legendPosition = 'right') #find PC's to keep elbow <- findElbowPoint(dds_vst_pca_.1$variance) #make pairs plot (90% top) pairsplot(dds_vst_pca_.1, components = getComponents(dds_vst_pca_.1, c(1:10)), colby = "treatment", colkey = c("e" = "gray", "m" = "darkgreen", "n" = "lightgreen"))
Then, I tried to check myself and made another biplot with prcomp() according to the following code:
dds_vst_prc <- prcomp(counts_norm_vst, scale = TRUE) ntop_.1 <- nrow(counts)*.9 ntop_.1 <- as.integer(ntop_.1) #do prcomp with lower 10% of variable genes removed select_.1 <- order(rv, decreasing = TRUE)[seq_len(min(ntop_.1, length(rv)))] mat_.1 <- t( assay(dds_norm_vst)[select_.1, ] ) dds_norm_vst_prc_.1 <-prcomp(mat_.1) dds_norm_vst_prc_.1_df <- as.data.frame(dds_norm_vst_prc_.1$x) percentVar_all_.1 <- dds_norm_vst_prc_.1$sdev^2 / sum(dds_norm_vst_prc_.1$sdev^2) ggplot(dds_norm_vst_prc_.1_df, aes(PC1, PC2, color = time, shape = treatment)) + geom_point(size = 4) + xlab(paste0("PC1: ",percentVar_all_.1*100,"% variance")) + ylab(paste0("PC2: ",percentVar_all_.1*100,"% variance")) + coord_fixed() + ggtitle("PCA plot")
Indeed, these two give exactly the same biplot (same percentages of PC's and same position of the samples). However, then I looked into scaling, and I see that although for the prcomp() I put scaling to TRUE, for the pcatools method i did not specify, assuming scaling default would be TRUE. However, I found out the default is FALSE! Indeed, when I put scaling to TRUE, I get a different outcome. How is this possible? What am I overlooking? Thanks very much for any input!