What result from DESeq2 should be used as an input for volcano plot?
1
1
Entering edit mode
10 months ago
helen ▴ 40

My cutoff for differentially expressed genes (DEGs) is adjusted P < 0.05 and |log2FoldChange| > 1. I used this code in DESeq2 to get those DEGs:

library(DESeq2)

coldata <- read.csv('coldata_Sst.csv', row.names = 1)

dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition + gender)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref = "Sst_Input")

res <- results(dds, alpha = 0.05, lfcThreshold = 1, altHypothesis = "greaterAbs",
contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

summary(res)


The summary of res is

out of 18461 with nonzero total read count
LFC > 1.00 (up)    : 131, 0.71%
LFC < -1.00 (down) : 907, 4.9%
outliers [1]       : 7, 0.038%
low counts [2]     : 1432, 7.8%
(mean count < 2)


Then I used the package EnhancedVolcano to plot a volcano plot from res using the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 :

library(EnhancedVolcano)

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res))
names(keyvals) <- rep('NS', nrow(res))

keyvals[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- '-Log10P'

keyvals[which(res$log2FoldChange < -FC & res$padj < q)] <- 'blue'
names(keyvals)[which(res$log2FoldChange < -FC & res$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res$log2FoldChange > FC & res$padj < q)] <- 'red'
names(keyvals)[which(res$log2FoldChange > FC & res$padj < q)] <- 'Signif. up-regulated'

EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
xlab = bquote(~Log[2]~ 'fold change'),
title = 'res',
pCutoff = 0.05,
FCcutoff = 1,
colCustom = keyvals,
colAlpha = 0.5,
drawConnectors = FALSE,
widthConnectors = 0.5,
colConnectors = 'grey50',
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black',
xlim = c(-15,15),
ylim = c(0,5))


However, I find the shape of volcano plot not very typical

If I extracted the result using default in DESeq2 (which is adjusted p-value < 0.1 and |log2FoldChange| > 0)

res1 <- results(dds, contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")


the summary of res1 would be

out of 18461 with nonzero total read count
LFC > 0 (up)       : 4817, 26%
LFC < 0 (down)     : 5632, 31%
outliers [1]       : 7, 0.038%
low counts [2]     : 358, 1.9%
(mean count < 1)


Then I again used the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 for volcano plot

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res1))
names(keyvals) <- rep('NS', nrow(res1))

keyvals[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- '-Log10P'

keyvals[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'blue'
names(keyvals)[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'red'
names(keyvals)[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'Signif. up-regulated'

EnhancedVolcano(res1,
lab = rownames(res1),
...
xlim = c(-15,15),
ylim = c(0,10))


The volcano plot looks more typical

However, it shows a lot more DEGs in res1 than in res. Is it because in res, I filtered the DEGs twice in DESeq2 and in the volcano plot?

I was wondering which method is correct. What result from DESeq2 should I use to generate a volcano plot that shows all data points and labels the correct number of DEGs in colors? Any comments would be appreciated.

DESeq2 volcano plot EnhancedVolcano RNA-Seq • 658 views
2
Entering edit mode
9 months ago
MaxF ▴ 80

You're calling the results function in two different ways.

By default, DESeq is answering the question "Which genes are differentially expressed?" The p values generated in this case are only asking if there is a difference. If there's a gene with 100 counts in one condition and 110 counts in another condition, but this difference is super consistent across many replicates, you'll get a very significant p value. Such a gene may not be that interesting (because the L2FC is quite low), but it will be significant.

In your first example, you're testing a different hypothesis by asking "Which genes are differentially expressed with at least a L2FC of [some number]". Here the p values are corresponding to that specific question, so all of the genes that were in the middle of your plot (ie: the ones that we can be pretty sure have a L2FC less than your cutoff) have their pValues set to 1 and are dropped onto the X axis.

I think the best way to do it is by doing the default test for differential expression and then using the cutoffs for color-coding. Alternative hypothesis testing for |L2FC| is great for narrowing down which genes you'd like to focus on for validation/follow-up since it is more stringent.