What result from DESeq2 should be used as an input for volcano plot?
1
1
Entering edit mode
3.9 years ago
helen ▴ 70

My cutoff for differentially expressed genes (DEGs) is adjusted P < 0.05 and |log2FoldChange| > 1. I used this code in DESeq2 to get those DEGs:

library(DESeq2)

cts <- as.matrix(read.csv("GE_counts_Sst.csv",row.names="gene"))
coldata <- read.csv('coldata_Sst.csv', row.names = 1)

dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = coldata,
                          design = ~ condition + gender)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref = "Sst_Input")

dds <- DESeq(adds)

res <- results(dds, alpha = 0.05, lfcThreshold = 1, altHypothesis = "greaterAbs", 
contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

summary(res)

The summary of res is

out of 18461 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up)    : 131, 0.71%
LFC < -1.00 (down) : 907, 4.9%
outliers [1]       : 7, 0.038%
low counts [2]     : 1432, 7.8%
(mean count < 2)

Then I used the package EnhancedVolcano to plot a volcano plot from res using the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 :

library(EnhancedVolcano)

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res))
names(keyvals) <- rep('NS', nrow(res))

keyvals[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- '-Log10P'

keyvals[which(res$log2FoldChange < -FC & res$padj < q)] <- 'blue'
names(keyvals)[which(res$log2FoldChange < -FC & res$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res$log2FoldChange > FC & res$padj < q)] <- 'red'
names(keyvals)[which(res$log2FoldChange > FC & res$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res,
                lab = rownames(res),
                x = 'log2FoldChange',
                y = 'padj',
                xlab = bquote(~Log[2]~ 'fold change'),
                ylab = bquote(~-Log[10] ~ adjusted~italic(P)),
                title = 'res',
                pCutoff = 0.05,
                FCcutoff = 1,
                colCustom = keyvals,
                colAlpha = 0.5,
                drawConnectors = FALSE,
                widthConnectors = 0.5,
                colConnectors = 'grey50',
                gridlines.major = TRUE,
                gridlines.minor = FALSE,
                border = 'partial',
                borderWidth = 1.5,
                borderColour = 'black',
                xlim = c(-15,15),
                ylim = c(0,5))

However, I find the shape of volcano plot not very typical
res

if you can't see the image, click here


If I extracted the result using default in DESeq2 (which is adjusted p-value < 0.1 and |log2FoldChange| > 0)

res1 <- results(dds, contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

the summary of res1 would be

out of 18461 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 4817, 26%
LFC < 0 (down)     : 5632, 31%
outliers [1]       : 7, 0.038%
low counts [2]     : 358, 1.9%
(mean count < 1)

Then I again used the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 for volcano plot

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res1))
names(keyvals) <- rep('NS', nrow(res1))

keyvals[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- '-Log10P'

keyvals[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'blue'
names(keyvals)[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'red'
names(keyvals)[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res1,
                lab = rownames(res1),
                ...
                xlim = c(-15,15),
                ylim = c(0,10))

The volcano plot looks more typical

res1 if you can't see the image, click here

However, it shows a lot more DEGs in res1 than in res. Is it because in res, I filtered the DEGs twice in DESeq2 and in the volcano plot?

I was wondering which method is correct. What result from DESeq2 should I use to generate a volcano plot that shows all data points and labels the correct number of DEGs in colors? Any comments would be appreciated.

DESeq2 volcano plot EnhancedVolcano RNA-Seq • 2.5k views
ADD COMMENT
3
Entering edit mode
3.8 years ago
MaxF ▴ 120

You're calling the results function in two different ways.

By default, DESeq is answering the question "Which genes are differentially expressed?" The p values generated in this case are only asking if there is a difference. If there's a gene with 100 counts in one condition and 110 counts in another condition, but this difference is super consistent across many replicates, you'll get a very significant p value. Such a gene may not be that interesting (because the L2FC is quite low), but it will be significant.

In your first example, you're testing a different hypothesis by asking "Which genes are differentially expressed with at least a L2FC of [some number]". Here the p values are corresponding to that specific question, so all of the genes that were in the middle of your plot (ie: the ones that we can be pretty sure have a L2FC less than your cutoff) have their pValues set to 1 and are dropped onto the X axis.

I think the best way to do it is by doing the default test for differential expression and then using the cutoffs for color-coding. Alternative hypothesis testing for |L2FC| is great for narrowing down which genes you'd like to focus on for validation/follow-up since it is more stringent.

ADD COMMENT

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6