How to select differential expressed genes using DeSeq2?
1
2
Entering edit mode
12 months ago
Aynur ▴ 40

Hello, I am new to bioinformatics, and I am trying to do DEG analysis using DESeq2. I followed the DESeq2 vignette, and I did this

res <- results(dds, lfcThreshold = log2(2), alpha = 0.05)

summary(res)

out of 17813 with nonzero total read count.
adjusted p-value < 0.05
LFC > 1.00 (up)    : 78, 0.44%
LFC < -1.00 (down) : 106, 0.6%
outliers [1]       : 0, 0%
low counts [2]     : 2073, 12%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


So, my question is, how do I get my up-regulated and down-regulated genes from this output? I am setting my log2fc=1 for the cutoff threshold. I also read this thread, but I still could not find the answer I am looking for. Thank you very much.

RNA-Seq R gene • 738 views
ADD COMMENT
2
Entering edit mode
12 months ago
Barry Digby ▴ 780

You can convert the res object to a dataframe by simply calling res_df <- as.data.frame(res). Then apply filtering like these two functions do:

get_upregulated <- function(df){

key <- intersect(rownames(df)[which(df$log2FoldChange>=1)], rownames(df)[which(df$pvalue<=0.05)])

results <- as.data.frame((df)[which(rownames(df) %in% key),])
return(results)
}

get_downregulated <- function(df){

key <- intersect(rownames(df)[which(df$log2FoldChange<=-1)], rownames(df)[which(df$pvalue<=0.05)])

results <- as.data.frame((df)[which(rownames(df) %in% key),])
return(results)
}


up <- get_upregulated(res_df) down <- get_downregulated(res_df)

ADD COMMENT
0
Entering edit mode

Thank you Barry. With the above code I got 146 genes for up-regulated, and 190 genes for down-regulated. I am concerned because they are less than expected. Then I also tried these codes from a training website- DESeq2 workshop padj.cutoff <- 0.05. lfc.cutoff <- 1

res <- res %>%  data.frame() %>%  rownames_to_column(var="gene") %>%  as_tibble()

sig <- res %>% data.frame() %>% rownames_to_column(var="gene") %>% filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)


With the above code, I got 590 genes for up and down-regulated genes.
What is wrong here? Which one should I take? why?

Thanks.

ADD REPLY
0
Entering edit mode

Hi Aynur,

The code I gave you filters by pvalue, not padj, it will return more genes.

Furthermore, the code example you provided does not filter by log2FoldChange as expected (at least when i tested it). It only filters by padj which is a rather egregious error.

If you want to replicate the results of summary(res) then just pay attention to the output and replicate it step by step:

> summary(res)

out of 28785 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1502, 5.2%
LFC < 0 (down)     : 1349, 4.7%
outliers [1]       : 0, 0%
[1] see 'cooksCutoff' argument of ?results


It says it used LFC > 0 and padj < 0.05.

sig <- res_df[which(res_df$padj <= 0.05),] up <- sig[which(sig$log2FoldChange > 0),]
nrow(up)
[1] 1502
:)


Obviously we need to change the LFC > 0 part to something biologically plausible, i.e +/- 1. This should explain why you 'got less genes than expected'.

ADD REPLY
0
Entering edit mode

Thank you very much for your clarification. For the significantly up-regulated and down-regulated gene list, I also got N/As for padj, do I also need to filter them out for my final list or is it irrelevant? I want to validate those genes using RT-PCR, so I need a significantly differentially expressed gene list based on my log fold change cutoff.

Regards,

ADD REPLY
1
Entering edit mode

You specifically want to evaluate genes with NAs in the wet lab? Don't.

Try this filtering method, it should help prevent NA's and spurious (very high/negative Log2FC) calls showing up in your up/down regulated results:

library(IHW)
library(apeglm)

res <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "tumor", "normal"))

> resultsNames(dds)
[1] "Intercept"                 "condition_tumor_vs_normal"  # the contrast im interested in is the second coef (resultsNames(dds)[2])

LFC <- lfcShrink(dds = dds, res= res, coef = 2, type = "apeglm") #coef=2 refers to the contrast 'tumor_vs_normal'
LFC_df <- as.data.frame(LFC)


You can now subset the LFC dataframe like in the previous comments.

IHW: Instead of finding a threshold on the mean of scaled counts that optimizes the number of rejected hypotheses following Benjamini-Hochberg correction, the IHW package finds an optimal weighting of the hypotheses that maximizes power while still controlling the FDR. - source

apeglm: in a nutshell, preserves 'true' LFC differences and removes genes that could have presented seemingly valid Log2FCs due to noise.

ADD REPLY

Login before adding your answer.

Traffic: 2273 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6