Question: DESeq2 multiple group comparison
0
gravatar for dazhudou1122
7 months ago by
dazhudou112240
dazhudou112240 wrote:

Dear Biostars community:

I ran into a question when I was doing RNA-seq analysis using DeSEQ2. I have 4 groups, 3 samples per group, and set my groups using the following code:

(condition <- factor(c(rep("ctl", 3), rep("A", 3), rep("B", 3), rep("C", 3))))

Can anyone tell me how the comparison in DeSEQ2 is done? I am not sure what the p value and log2FC mean in the default output. Supposedly it should be outputting ctl vs A in the default output. But the data is different when I have only ctl and A as input (2 groups). Also, is there a way to specify how the program performs the comparison? I read some previous threads but the answers were not very clear. Any help will be highly appreciated!

Best,

Wenhan

Below is the code I used:

countdata <- read.table("B6_B6MHV_B6MHVWY.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("ctl", 3), rep("A", 3), rep("B", 3), rep("C", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)
# Plot
dispersions
png("qc-dispersions.png", 2000, 2000, pointsize=20)
plotDispEsts(dds, main="Dispersion plot")
dev.off()

# Regularized log transformation for clustering/heatmaps, etc
rld <- rlogTransformation(dds)
head(assay(rld))
hist(assay(rld))

# Colors for plots below
## Ugly:
## (mycols <- 1:length(unique(condition)))
## Use RColorBrewer, better
library(RColorBrewer)
(mycols <- brewer.pal(8, "Dark2")[1:length(unique(condition))])

# Sample distance heatmap
sampleDists <- as.matrix(dist(t(assay(rld))))
library(gplots)
png("qc-heatmap-samples.png", w=1500, h=2500, pointsize=1500)
heatmap.2(as.matrix(sampleDists), key=F, trace="none",
          col=colorpanel(100, "black", "white"),
          ColSideColors=mycols[condition], RowSideColors=mycols[condition],
          margin=c(10, 10), main="Sample Distance Matrix")
dev.off()

# Principal components analysis
## Could do with built-in DESeq2 function:
## DESeq2::plotPCA(rld, intgroup="condition")
## I like mine better:
rld_pca <- function (rld, intgroup = "condition", ntop = 500, colors=NULL, legendpos="bottomleft", main="PCA Biplot", textcx=1, ...) {
  require(genefilter)
  require(calibrate)
  require(RColorBrewer)
  rv = rowVars(assay(rld))
  select = order(rv, decreasing = TRUE)[seq_len(min(ntop, length(rv)))]
  pca = prcomp(t(assay(rld)[select, ]))
  fac = factor(apply(as.data.frame(colData(rld)[, intgroup, drop = FALSE]), 1, paste, collapse = " : "))
  if (is.null(colors)) {
    if (nlevels(fac) >= 3) {
      colors = brewer.pal(nlevels(fac), "Paired")
    }   else {
      colors = c("black", "red")
    }
  }
  pc1var <- round(summary(pca)$importance[2,1]*100, digits=1)
  pc2var <- round(summary(pca)$importance[2,2]*100, digits=1)
  pc1lab <- paste0("PC1 (",as.character(pc1var),"%)")
  pc2lab <- paste0("PC1 (",as.character(pc2var),"%)")
  plot(PC2~PC1, data=as.data.frame(pca$x), bg=colors[fac], pch=21, xlab=pc1lab, ylab=pc2lab, main=main, ...)
  with(as.data.frame(pca$x), textxy(PC1, PC2, labs=rownames(as.data.frame(pca$x)), cex=textcx))
  legend(legendpos, legend=levels(fac), col=colors, pch=20)
  #     rldyplot(PC2 ~ PC1, groups = fac, data = as.data.frame(pca$rld),
  #            pch = 16, cerld = 2, aspect = "iso", col = colours, main = draw.key(key = list(rect = list(col = colours),
  #                                                                                         terldt = list(levels(fac)), rep = FALSE)))
}
png("qc-pca.png", 1500, 1500, pointsize=25)
rld_pca(rld, colors=mycols, intgroup="condition", xlim=c(-75, 35))
dev.off()

# Get differential expression results
res <- results(dds)
table(res$padj<0.05)
## Order by adjusted p-value
res <- res[order(res$padj), ]
## Merge with normalized count data
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "Gene"
head(resdata)
## Write results
write.csv(resdata, file="diffexpr-results.csv")
rna-seq deseq2 • 1.6k views
ADD COMMENTlink modified 7 months ago by Kevin Blighe46k • written 7 months ago by dazhudou112240
6
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

Your problem may be in setting the reference level for your condition factor.

Here is a reproducible example for anyone to follow:

# create random data of 12 samples and 4800 genes
x <- round(matrix(rexp(480 * 10, rate=.1), ncol=12), 0)
rownames(x) <- paste("gene", 1:nrow(x))
colnames(x) <- paste("sample", 1:ncol(x))

# fake metadata
coldata <- data.frame(condition = factor(c(rep("ctl", 3), rep("A", 3), rep("B", 3), rep("C", 3))))

# set `ctl` as reference level
coldata$condition <- relevel(coldata$condition, ref = "ctl")

# run DESeq2
library(DESeq2)
dds <- DESeqDataSetFromMatrix(countData = x, colData = coldata, design= ~ condition)
dds <- DESeq(dds, betaPrior = FALSE)

resultsNames(dds)
[1] "Intercept"          "condition_A_vs_ctl" "condition_B_vs_ctl"
[4] "condition_C_vs_ctl"

# generate results table for A vs ctl
res <- results(dds, name="condition_A_vs_ctl")

# same as above but with lfc shrinkage
res <- lfcShrink(dds, coef="condition_A_vs_ctl", type="apeglm")
res
log2 fold change (MAP): condition A vs ctl 
Wald test p-value: condition A vs ctl 
DataFrame with 400 rows and 5 columns
                 baseMean      log2FoldChange             lfcSE
                <numeric>           <numeric>         <numeric>
gene 1   6.52931993577912   0.072476893785378 0.357142842617753
gene 2   5.76235176928007 -0.0933872559160501 0.363373514145294
gene 3   6.50856065196759 -0.0204496319490765 0.345984423938138
gene 4   8.22529367562065  -0.214564268854677 0.443132280960639
gene 5   5.79513765732215  -0.124262442880464 0.378075429134928
...                   ...                 ...               ...
gene 396 6.31562199001634   0.216432088296947 0.433285297905755
gene 397 7.06387542770804 -0.0514375974511503 0.347860712050382
gene 398 11.7113061577201 0.00284345055606203 0.345680871683025
gene 399 6.49584500337424 -0.0476168389832278 0.345546422494701
gene 400  9.9964224972449  -0.163803486191863 0.398777125778805
                     pvalue              padj
                  <numeric>         <numeric>
gene 1    0.483733657727805 0.868376605216856
gene 2     0.37470748747808 0.833726100572576
gene 3    0.849966094025773 0.980516244268461
gene 4   0.0516046628774167 0.562928133989655
gene 5    0.228734151969893 0.786123281264348
...                     ...               ...
gene 396 0.0892978383641832 0.562928133989655
gene 397  0.649120107927275 0.943542041143732
gene 398  0.983171679048765 0.997357437589209
gene 399  0.684227860534656 0.953646890096339
gene 400  0.139850562852436 0.647215395526392

You can also compare other levels, e.g., A vs C

results(dds, c("condition", "A", "C"))

See the DESeq2 vignette for all other possible functionality.

Kevin

ADD COMMENTlink modified 7 months ago • written 7 months ago by Kevin Blighe46k

Hello Kevin, I am using your example it is working well. I have few question from my data: 1. How can I remove batch effect from my data with DESEQ2. 2. I know the VST function available in DESEQ2 for remove batch effect, but after this step how can we do DE analysis from the adjusted data. Hope you can understand my problem.

ADD REPLYlink written 4 months ago by anshulmbi0

Hey, there is a function called removeBatchEffects (limma), which may be of use to you. There is a practical example given here with DESeq2: https://support.bioconductor.org/p/76099/#93176

It is neither VST nor rlog that are removing batch, technically speaking.

ADD REPLYlink written 4 months ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1541 users visited in the last hour