Question: Why do I get different results for PC1 from plotPCA(DESeq2) and prcomp?
1
gravatar for maria2019
10 months ago by
maria2019100
maria2019100 wrote:

I have analyzed RNA-seq data with DESeq2 and am trying to plot a 3D PCA using rgl-plot3d.

I was trying to output PC1, PC2, and PC3 and then plot them. However, I realized that I get different results for PC1 (and PC2) when I try plotPCA (used with DESeq2) and prcomp.

What is the bug on my code?

dds <- DESeqDataSetFromHTSeqCount( sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~group)
rld <- rlog(dds, blind=TRUE)

From DESeq2:

data <- plotPCA(rld, intgroup=c("treatment", "sex"), returnData=TRUE )
data$PC1

[1] -1.9169863 -2.0420236 -1.9979900 -1.8891056 0.9242008 1.0638140

[7] 0.6911183 1.0551864 0.9598643 -1.5947907 -1.5666862 -1.6694684

[13] -1.2523658 -1.0785239 1.3005578 2.2913536 2.5381586 2.4287372

[19] 1.7549495

Using prcomp

mat <- assay(rld)
pca<-prcomp(t(mat))
pca <- as.data.frame(pca$x)
pca$PC1

[1] -1.29133735 -2.96001734 -3.08855648 -3.51855030 -0.68814370 -0.01753268

[7] -2.31119461 -0.10533404 -1.45742308 -1.30239486 -1.36344946 -1.93761580

[13] 6.04484324 4.83113873 0.75050886 -0.14905189 2.70759465 3.43851631

[19] 2.41799979

prcomp plotpca pca deseq2 • 1.1k views
ADD COMMENTlink modified 10 months ago • written 10 months ago by maria2019100
1

Thanks kevin, ATpoint, and Asaf. I tried the following code based on your comments and get the same result as plotPCA.

ntop <- 500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )

pca<-prcomp(mat)
pca <- as.data.frame(pca$x)
pca$PC1
ADD REPLYlink modified 10 months ago • written 10 months ago by maria2019100

try prcomp with scale=TRUE

ADD REPLYlink written 10 months ago by Jeremy Leipzig19k
2
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Before calling prcomp() internally (HERE), the DESeq2 function pre-filters your data based on variance. There is a parameter that can be modified to indirectly disable this behaviour..

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe67k
2
gravatar for Asaf
10 months ago by
Asaf8.5k
Israel
Asaf8.5k wrote:

You got me intrigued so I looked at plotPCA code:

function (object, intgroup = "condition", ntop = 500, returnData = FALSE) 
{
    rv <- rowVars(assay(object))
    select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
        length(rv)))]
    pca <- prcomp(t(assay(object)[select, ]))

So DESeq2 first sort the rows by variance, select the top 500 (by default) and only then call prcomp

ADD COMMENTlink modified 10 months ago • written 10 months ago by Asaf8.5k
2
gravatar for ATpoint
10 months ago by
ATpoint42k
Germany
ATpoint42k wrote:

plotPCA by default uses the top 500 most variable genes prior to prcomp, and you used all genes. Check the source code of plotPCA for details, e.g. by typing getMethod("plotPCA","DESeqTransform") in R.

Edit: Cool, Kevin Blighe and Asaf gave the same answers basically at the same time, so I guess this increases confidence :)

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1161 users visited in the last hour