Question

In DESeq2, using plotPCA how do I go about extracting genes making up PC1 and PC2?

1

Entering edit mode

12 months ago

mgranada3 ▴ 30

I have seen some old posts regarding my question but have found that many have expired links. I am performing a biological experiment and need to know what is the reason behind my clustering of PC1 and PC2. I am fairly new to R, DESeq2, and gene expression analysis. This is a snippet of my code:

library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)
dds
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
vsd <- vst(dds, blind=FALSE)    
plotPCA(vsd, intgroup = "condition", ntop = 500, returnData = FALSE)

How can I have R export the top 500 genes in PC1 and PC2? I tried using this but It is just my gene list vsd normalized.

> pca<- assay(vsd, intgroup = "condition", ntop = 500, returnData = FALSE)
> write.csv(as.data.frame(pca), file="PCA_results.csv")

I have seen other posts saying you can make your own function but I'm confused about how I would even change the code to call for my PC1 and PC2, as well as, how would my data be output? Example Post

pc1 deseq2 plotpca pc2 • 2.4k views

ADD COMMENT • link 12 months ago by mgranada3 ▴ 30

score 5 · Accepted Answer · 2023-04-11

5

Entering edit mode

12 months ago

rpolicastro 13k

plotPCA is just a convenience wrapper for the base R function stats::prcomp. I've hijacked and modified their code below so you can get the loadings for PC1 and 2.

# Run variance stabilizing transformation on the counts.
object <- vst(dds)

# calculate the variance for each gene
rv <- rowVars(assay(object))

# Top n genes by variance to keep.
ntop <- 500

# select the ntop genes by variance
select <- order(rv, decreasing=TRUE)[seq_len(min(ntop, length(rv)))]

# perform a PCA on the data in assay(x) for the selected genes
pca <- prcomp(t(assay(object)[select,]))

# Loadings for the first two PCs.
loadings <- pca$rotation[, seq_len(2)]

ADD COMMENT • link 12 months ago by rpolicastro 13k

0

Entering edit mode

Thank you, this worked perfectly. I now have a gene list with two columns PC1 and PC2. So both PC1 and PC2 are composed of the same gene list. How do I know which one leans more towards one of the other? For example, gene 20 shows a -0.03 value for PC1 and 8.24 for PC2. Does that mean gene 20 weighs more towards PC2 than PC1?

ADD REPLY • link 12 months ago by mgranada3 ▴ 30

0

Entering edit mode

Generally speaking, yes. The higher the loading value for a gene in a dimension the more that gene contributed to the separation of the samples (aka variance explained). It's also important to look at the relative rank of the gene among all genes in that dimension, and to follow up with further analysis if any of the genes become of interest.