Question: Why do I get different results for PC1 from plotPCA(DESeq2) and prcomp?
1
gravatar for maria2019
2 days ago by
maria201980
maria201980 wrote:

I have analyzed RNA-seq data with DESeq2 and am trying to plot a 3D PCA using rgl-plot3d.

I was trying to output PC1, PC2, and PC3 and then plot them. However, I realized that I get different results for PC1 (and PC2) when I try plotPCA (used with DESeq2) and prcomp.

What is the bug on my code?

dds <- DESeqDataSetFromHTSeqCount( sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~group)
rld <- rlog(dds, blind=TRUE)

From DESeq2:

data <- plotPCA(rld, intgroup=c("treatment", "sex"), returnData=TRUE )
data$PC1

[1] -1.9169863 -2.0420236 -1.9979900 -1.8891056 0.9242008 1.0638140

[7] 0.6911183 1.0551864 0.9598643 -1.5947907 -1.5666862 -1.6694684

[13] -1.2523658 -1.0785239 1.3005578 2.2913536 2.5381586 2.4287372

[19] 1.7549495

Using prcomp

mat <- assay(rld)
pca<-prcomp(t(mat))
pca <- as.data.frame(pca$x)
pca$PC1

[1] -1.29133735 -2.96001734 -3.08855648 -3.51855030 -0.68814370 -0.01753268

[7] -2.31119461 -0.10533404 -1.45742308 -1.30239486 -1.36344946 -1.93761580

[13] 6.04484324 4.83113873 0.75050886 -0.14905189 2.70759465 3.43851631

[19] 2.41799979

prcomp plotpca pca deseq2 • 122 views
ADD COMMENTlink modified 2 days ago • written 2 days ago by maria201980
1

Thanks kevin, ATpoint, and Asaf. I tried the following code based on your comments and get the same result as plotPCA.

ntop <- 500
rv <- rowVars(assay(rld))
select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
mat <- t( assay(rld)[select, ] )

pca<-prcomp(mat)
pca <- as.data.frame(pca$x)
pca$PC1
ADD REPLYlink modified 2 days ago • written 2 days ago by maria201980

try prcomp with scale=TRUE

ADD REPLYlink written 2 days ago by Jeremy Leipzig18k
2
gravatar for Kevin Blighe
2 days ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Before calling prcomp() internally (HERE), the DESeq2 function pre-filters your data based on variance. There is a parameter that can be modified to indirectly disable this behaviour..

Kevin

ADD COMMENTlink written 2 days ago by Kevin Blighe53k
2
gravatar for Asaf
2 days ago by
Asaf6.6k
Israel
Asaf6.6k wrote:

You got me intrigued so I looked at plotPCA code:

function (object, intgroup = "condition", ntop = 500, returnData = FALSE) 
{
    rv <- rowVars(assay(object))
    select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
        length(rv)))]
    pca <- prcomp(t(assay(object)[select, ]))

So DESeq2 first sort the rows by variance, select the top 500 (by default) and only then call prcomp

ADD COMMENTlink modified 2 days ago • written 2 days ago by Asaf6.6k
2
gravatar for ATpoint
2 days ago by
ATpoint28k
Germany
ATpoint28k wrote:

plotPCA by default uses the top 500 most variable genes prior to prcomp, and you used all genes. Check the source code of plotPCA for details, e.g. by typing getMethod("plotPCA","DESeqTransform") in R.

Edit: Cool, Kevin Blighe and Asaf gave the same answers basically at the same time, so I guess this increases confidence :)

ADD COMMENTlink modified 2 days ago • written 2 days ago by ATpoint28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1540 users visited in the last hour