Question: What result from DESeq2 should be used as an input for volcano plot?
1
gravatar for helen
7 months ago by
helen40
helen40 wrote:

My cutoff for differentially expressed genes (DEGs) is adjusted P < 0.05 and |log2FoldChange| > 1. I used this code in DESeq2 to get those DEGs:

library(DESeq2)

cts <- as.matrix(read.csv("GE_counts_Sst.csv",row.names="gene"))
coldata <- read.csv('coldata_Sst.csv', row.names = 1)

dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = coldata,
                          design = ~ condition + gender)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref = "Sst_Input")

dds <- DESeq(adds)

res <- results(dds, alpha = 0.05, lfcThreshold = 1, altHypothesis = "greaterAbs", 
contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

summary(res)

The summary of res is

out of 18461 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up)    : 131, 0.71%
LFC < -1.00 (down) : 907, 4.9%
outliers [1]       : 7, 0.038%
low counts [2]     : 1432, 7.8%
(mean count < 2)

Then I used the package EnhancedVolcano to plot a volcano plot from res using the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 :

library(EnhancedVolcano)

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res))
names(keyvals) <- rep('NS', nrow(res))

keyvals[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res$log2FoldChange) > FC & res$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res$log2FoldChange) < FC & res$padj < q)] <- '-Log10P'

keyvals[which(res$log2FoldChange < -FC & res$padj < q)] <- 'blue'
names(keyvals)[which(res$log2FoldChange < -FC & res$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res$log2FoldChange > FC & res$padj < q)] <- 'red'
names(keyvals)[which(res$log2FoldChange > FC & res$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res,
                lab = rownames(res),
                x = 'log2FoldChange',
                y = 'padj',
                xlab = bquote(~Log[2]~ 'fold change'),
                ylab = bquote(~-Log[10] ~ adjusted~italic(P)),
                title = 'res',
                pCutoff = 0.05,
                FCcutoff = 1,
                colCustom = keyvals,
                colAlpha = 0.5,
                drawConnectors = FALSE,
                widthConnectors = 0.5,
                colConnectors = 'grey50',
                gridlines.major = TRUE,
                gridlines.minor = FALSE,
                border = 'partial',
                borderWidth = 1.5,
                borderColour = 'black',
                xlim = c(-15,15),
                ylim = c(0,5))

However, I find the shape of volcano plot not very typical
res

if you can't see the image, click here


If I extracted the result using default in DESeq2 (which is adjusted p-value < 0.1 and |log2FoldChange| > 0)

res1 <- results(dds, contrast = c("condition","Sst_IP","Sst_Input"), name = "condition_Sst_IP_vs_Sst_Input")

the summary of res1 would be

out of 18461 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 4817, 26%
LFC < 0 (down)     : 5632, 31%
outliers [1]       : 7, 0.038%
low counts [2]     : 358, 1.9%
(mean count < 1)

Then I again used the cutoff adjusted P < 0.05 and |log2FoldChange| > 1 for volcano plot

FC <- 1
q <- 0.05

keyvals <- rep('grey75', nrow(res1))
names(keyvals) <- rep('NS', nrow(res1))

keyvals[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'grey50'
names(keyvals)[which(abs(res1$log2FoldChange) > FC & res1$padj > q)] <- 'log2FoldChange'

keyvals[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- 'grey25'
names(keyvals)[which(abs(res1$log2FoldChange) < FC & res1$padj < q)] <- '-Log10P'

keyvals[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'blue'
names(keyvals)[which(res1$log2FoldChange < -FC & res1$padj < q)] <- 'Signif. down-regulated'

keyvals[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'red'
names(keyvals)[which(res1$log2FoldChange > FC & res1$padj < q)] <- 'Signif. up-regulated'


EnhancedVolcano(res1,
                lab = rownames(res1),
                ...
                xlim = c(-15,15),
                ylim = c(0,10))

The volcano plot looks more typical

res1 if you can't see the image, click here

However, it shows a lot more DEGs in res1 than in res. Is it because in res, I filtered the DEGs twice in DESeq2 and in the volcano plot?

I was wondering which method is correct. What result from DESeq2 should I use to generate a volcano plot that shows all data points and labels the correct number of DEGs in colors? Any comments would be appreciated.

ADD COMMENTlink modified 6 months ago by MaxF80 • written 7 months ago by helen40
2
gravatar for MaxF
6 months ago by
MaxF80
MaxF80 wrote:

You're calling the results function in two different ways.

By default, DESeq is answering the question "Which genes are differentially expressed?" The p values generated in this case are only asking if there is a difference. If there's a gene with 100 counts in one condition and 110 counts in another condition, but this difference is super consistent across many replicates, you'll get a very significant p value. Such a gene may not be that interesting (because the L2FC is quite low), but it will be significant.

In your first example, you're testing a different hypothesis by asking "Which genes are differentially expressed with at least a L2FC of [some number]". Here the p values are corresponding to that specific question, so all of the genes that were in the middle of your plot (ie: the ones that we can be pretty sure have a L2FC less than your cutoff) have their pValues set to 1 and are dropped onto the X axis.

I think the best way to do it is by doing the default test for differential expression and then using the cutoffs for color-coding. Alternative hypothesis testing for |L2FC| is great for narrowing down which genes you'd like to focus on for validation/follow-up since it is more stringent.

ADD COMMENTlink written 6 months ago by MaxF80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2211 users visited in the last hour
_