Question: DEseq2 Shade PCA plot by gene counts
0
gravatar for paulranum11
5 days ago by
paulranum1140
paulranum1140 wrote:

Hi All,

I am working with DEseq2 to do some RNA-Seq analysis and would like to find a solution for displaying the expression level of individual genes on the PCA plot. This functionality exists built into other packages like Seurat (using "FeaturePlot"). Does anyone know of a way to implement this with DEseq2's plotPCA.

Here is the simple command i am using to generate the PCA plot.

plotPCA(vsd, intgroup=c("Treatment"))

The following is a link to a DEseq2 walkthrough for reference: http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

heatmap deseq2 pca ggplot2 • 104 views
ADD COMMENTlink modified 4 days ago by Kevin Blighe51k • written 5 days ago by paulranum1140
2
gravatar for Kevin Blighe
4 days ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Hey,

The DESeq2 PCA function will [I believe] convert a continuous variable to categorical, which is likely not what you want. This is fine, as DESeq2 is primarily about normalising RNA-seq data and conducting differential expression.

In any case, you can do this with my own package, PCAtools, which should fit nicely with your DESeq2 variance-stabilised expression values and metadata. Here is a reproducible example, taken from my vignette: PCAtools: everything Principal Component Analysis:

1, load breast cancer data from GEO

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE)
x <- exprs(gset[[1]])

# remove Affymetrix control probes
x <- x[-grep('^AFFX', rownames(x)),]

# extract information of interest from the phenotype data (pdata)
idx <- which(colnames(pData(gset[[1]])) %in%
  c('age:ch1', 'distant rfs:ch1', 'er:ch1',
    'ggi:ch1', 'grade:ch1', 'size:ch1',
    'time rfs:ch1'))

metadata <- data.frame(pData(gset[[1]])[,idx],
  row.names = rownames(pData(gset[[1]])))

# tidy column names
colnames(metadata) <- c('Age', 'Distant.RFS', 'ER', 'GGI', 'Grade',
  'Size', 'Time.RFS')

# prepare certain phenotypes
metadata$Age <- as.numeric(gsub('^KJ', NA, metadata$Age))
metadata$Distant.RFS <- factor(metadata$Distant.RFS, levels=c(0,1))
metadata$ER <- factor(gsub('\\?', NA, metadata$ER), levels=c(0,1))
metadata$ER <- factor(ifelse(metadata$ER == 1, 'ER+', 'ER-'), levels = c('ER-', 'ER+'))
metadata$GGI <- as.numeric(metadata$GGI)
metadata$Grade <- factor(gsub('\\?', NA, metadata$Grade), levels=c(1,2,3))
metadata$Grade <- gsub(1, 'Grade 1', gsub(2, 'Grade 2', gsub(3, 'Grade 3', metadata$Grade)))
metadata$Grade <- factor(metadata$Grade, levels = c('Grade 1', 'Grade 2', 'Grade 3'))
metadata$Size <- as.numeric(metadata$Size)
metadata$Time.RFS <- as.numeric(gsub('^KJX|^KJ', NA, metadata$Time.RFS))

# remove samples from the pdata that have any NA value
discard <- apply(metadata, 1, function(x) anyis.na(x)))
metadata <- metadata[!discard,]

# filter the expression data to match the samples in our pdata
x <- x[,which(colnames(x) %in% rownames(metadata))]

# check that sample names match exactly between pdata and expression data 
all(colnames(x) == rownames(metadata))

2, add ESR1 gene expression to the metadata

metadata$ESR1 <- x["205225_at",]

head(metadata)
         Age Distant.RFS  ER       GGI   Grade Size Time.RFS      ESR1
GSM65752  40           0 ER-  2.480050 Grade 3  1.2     2280  5.893152
GSM65753  46           0 ER+ -0.633592 Grade 1  1.3     2675 11.419733
GSM65755  41           1 ER+  1.043950 Grade 3  3.3      182 10.922990
GSM65757  34           0 ER+  1.059190 Grade 2  1.6     3952  9.629512
GSM65758  46           1 ER+ -1.233060 Grade 2  2.1     1824 11.367321
GSM65760  57           1 ER+  0.679034 Grade 3  2.2      699 10.161146

3, run PCA

library(PCAtools)
p <- pca(x, metadata = metadata, removeVar = 0.1)

Here, your x object would be assay(vsd)

4, generate bi-plot with continuous scale

biplot(p,
    lab = NULL,
    colby = 'ESR1',
    shape = 'ER',
    hline = 0, vline = 0,
    legendPosition = 'right') +

scale_colour_gradient(low = "gold", high = "red2")

aaa

Good validation in its own right, as those with higher ESR1 gene expression are also ER-positive by IHC / histology...

Kevin

ADD COMMENTlink modified 4 days ago • written 4 days ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1232 users visited in the last hour