Question

What result from DESeq2 should be used as an input for volcano plot?

1

Entering edit mode

3.9 years ago

helen ▴ 70

My cutoff for differentially expressed genes (DEGs) is adjusted P < 0.05 and |log2FoldChange| > 1. I used this code in DESeq2 to get those DEGs:

library(DESeq2)

cts <- as.matrix(read.csv("GE_counts_Sst.csv",row.names="gene"))
coldata <- read.csv('coldata_Sst.csv', row.names = 1)

dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = coldata,
                          design = ~ condition + gender)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref = "Sst_Input")

dds <- DESeq(adds)

res <- results(dds, alpha = 0.05, lfcThreshold = 1, altHypothesis = "greaterAbs", 
contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

summary(res)

The summary of res is

out of 18461 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up)    : 131, 0.71%
LFC < -1.00 (down) : 907, 4.9%
outliers [1]       : 7, 0.038%
low counts [2]     : 1432, 7.8%
(mean count < 2)

Then I used the package EnhancedVolcano to plot a volcano plot from res using the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 :

library(EnhancedVolcano)

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res))
names(keyvals) <- rep('NS', nrow(res))

keyvals[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- '-Log10P'

keyvals[which(res$log2FoldChange < -FC & res$padj < q)] <- 'blue'
names(keyvals)[which(res$log2FoldChange < -FC & res$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res$log2FoldChange > FC & res$padj < q)] <- 'red'
names(keyvals)[which(res$log2FoldChange > FC & res$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res,
                lab = rownames(res),
                x = 'log2FoldChange',
                y = 'padj',
                xlab = bquote(~Log[2]~ 'fold change'),
                ylab = bquote(~-Log[10] ~ adjusted~italic(P)),
                title = 'res',
                pCutoff = 0.05,
                FCcutoff = 1,
                colCustom = keyvals,
                colAlpha = 0.5,
                drawConnectors = FALSE,
                widthConnectors = 0.5,
                colConnectors = 'grey50',
                gridlines.major = TRUE,
                gridlines.minor = FALSE,
                border = 'partial',
                borderWidth = 1.5,
                borderColour = 'black',
                xlim = c(-15,15),
                ylim = c(0,5))

However, I find the shape of volcano plot not very typical
res

if you can't see the image, click here

If I extracted the result using default in DESeq2 (which is adjusted p-value < 0.1 and |log2FoldChange| > 0)

res1 <- results(dds, contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

the summary of res1 would be

out of 18461 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 4817, 26%
LFC < 0 (down)     : 5632, 31%
outliers [1]       : 7, 0.038%
low counts [2]     : 358, 1.9%
(mean count < 1)

Then I again used the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 for volcano plot

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res1))
names(keyvals) <- rep('NS', nrow(res1))

keyvals[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- '-Log10P'

keyvals[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'blue'
names(keyvals)[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'red'
names(keyvals)[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res1,
                lab = rownames(res1),
                ...
                xlim = c(-15,15),
                ylim = c(0,10))

The volcano plot looks more typical

res1 if you can't see the image, click here

However, it shows a lot more DEGs in res1 than in res. Is it because in res, I filtered the DEGs twice in DESeq2 and in the volcano plot?

I was wondering which method is correct. What result from DESeq2 should I use to generate a volcano plot that shows all data points and labels the correct number of DEGs in colors? Any comments would be appreciated.

DESeq2 volcano plot EnhancedVolcano RNA-Seq • 2.5k views

ADD COMMENT • link updated 3.8 years ago by MaxF ▴ 120 • written 3.9 years ago by helen ▴ 70

score 3 · Answer 1 · 2020-07-09

You're calling the results function in two different ways.

By default, DESeq is answering the question "Which genes are differentially expressed?" The p values generated in this case are only asking if there is a difference. If there's a gene with 100 counts in one condition and 110 counts in another condition, but this difference is super consistent across many replicates, you'll get a very significant p value. Such a gene may not be that interesting (because the L2FC is quite low), but it will be significant.

In your first example, you're testing a different hypothesis by asking "Which genes are differentially expressed with at least a L2FC of [some number]". Here the p values are corresponding to that specific question, so all of the genes that were in the middle of your plot (ie: the ones that we can be pretty sure have a L2FC less than your cutoff) have their pValues set to 1 and are dropped onto the X axis.

I think the best way to do it is by doing the default test for differential expression and then using the cutoffs for color-coding. Alternative hypothesis testing for |L2FC| is great for narrowing down which genes you'd like to focus on for validation/follow-up since it is more stringent.