Why do I get different results for PC1 from plotPCA(DESeq2) and prcomp?
3
2
Entering edit mode
4.3 years ago
maria2019 ▴ 250

I have analyzed RNA-seq data with DESeq2 and am trying to plot a 3D PCA using rgl-plot3d.

I was trying to output PC1, PC2, and PC3 and then plot them. However, I realized that I get different results for PC1 (and PC2) when I try plotPCA (used with DESeq2) and prcomp.

What is the bug on my code?

dds <- DESeqDataSetFromHTSeqCount( sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~group)
rld <- rlog(dds, blind=TRUE)

From DESeq2:

data <- plotPCA(rld, intgroup=c("treatment", "sex"), returnData=TRUE )
data$PC1

[1] -1.9169863 -2.0420236 -1.9979900 -1.8891056 0.9242008 1.0638140

[7] 0.6911183 1.0551864 0.9598643 -1.5947907 -1.5666862 -1.6694684

[13] -1.2523658 -1.0785239 1.3005578 2.2913536 2.5381586 2.4287372

[19] 1.7549495

Using prcomp

mat <- assay(rld)
pca<-prcomp(t(mat))
pca <- as.data.frame(pca$x)
pca$PC1

[1] -1.29133735 -2.96001734 -3.08855648 -3.51855030 -0.68814370 -0.01753268

[7] -2.31119461 -0.10533404 -1.45742308 -1.30239486 -1.36344946 -1.93761580

[13] 6.04484324 4.83113873 0.75050886 -0.14905189 2.70759465 3.43851631

[19] 2.41799979

plotPCA deseq2 prcomp PCA • 5.2k views
ADD COMMENT
2
Entering edit mode

Thanks kevin, ATpoint, and Asaf. I tried the following code based on your comments and get the same result as plotPCA.

ntop <- 500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )

pca<-prcomp(mat)
pca <- as.data.frame(pca$x)
pca$PC1
ADD REPLY
0
Entering edit mode

try prcomp with scale=TRUE

ADD REPLY
2
Entering edit mode
4.3 years ago

Before calling prcomp() internally (HERE), the DESeq2 function pre-filters your data based on variance. There is a parameter that can be modified to indirectly disable this behaviour..

Kevin

ADD COMMENT
2
Entering edit mode
4.3 years ago
Asaf 10k

You got me intrigued so I looked at plotPCA code:

function (object, intgroup = "condition", ntop = 500, returnData = FALSE) 
{
    rv <- rowVars(assay(object))
    select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
        length(rv)))]
    pca <- prcomp(t(assay(object)[select, ]))

So DESeq2 first sort the rows by variance, select the top 500 (by default) and only then call prcomp

ADD COMMENT
2
Entering edit mode
4.3 years ago
ATpoint 82k

plotPCA by default uses the top 500 most variable genes prior to prcomp, and you used all genes. Check the source code of plotPCA for details, e.g. by typing getMethod("plotPCA","DESeqTransform") in R.

Edit: Cool, Kevin Blighe and Asaf gave the same answers basically at the same time, so I guess this increases confidence :)

ADD COMMENT

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6