Question

How to select differential expressed genes using DeSeq2?

2

Entering edit mode

3.6 years ago

Aynur ▴ 60

Hello, I am new to bioinformatics, and I am trying to do DEG analysis using DESeq2. I followed the DESeq2 vignette, and I did this

res <- results(dds, lfcThreshold = log2(2), alpha = 0.05)

summary(res)

out of 17813 with nonzero total read count.                                                                                                                      
                        adjusted p-value < 0.05
LFC > 1.00 (up)    : 78, 0.44%
LFC < -1.00 (down) : 106, 0.6%
outliers [1]       : 0, 0%
low counts [2]     : 2073, 12%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

So, my question is, how do I get my up-regulated and down-regulated genes from this output? I am setting my log2fc=1 for the cutoff threshold. I also read this thread, but I still could not find the answer I am looking for. Thank you very much.

RNA-Seq R gene • 3.8k views

ADD COMMENT • link updated 8 months ago by Barry Digby ★ 1.3k • written 3.6 years ago by Aynur ▴ 60

0

Entering edit mode

2.1 years ago

ta_awwad ▴ 340

res <- results(dds ,contrast=c("condition","KO", "wt"), alpha = 0.05)

filter the result table based on Padj and Log2FC

resSig <- subset(res, padj < 0.05 & abs(log2FoldChange) > 0.58)

get upregulated genes

up <- subset (resSig,log2FoldChange > 0)

get downregulated genes

dn <- subset (resSig, log2FoldChange < 0)

ADD COMMENT • link 2.1 years ago by ta_awwad ▴ 340

score 5 · Accepted Answer · 2020-10-17

5

Entering edit mode

3.6 years ago

Barry Digby ★ 1.3k

You can convert the res object to a dataframe by simply calling res_df <- as.data.frame(res). Then apply filtering like these two functions do:

get_upregulated <- function(df){

    key <- intersect(rownames(df)[which(df$log2FoldChange>=1)], rownames(df)[which(df$pvalue<=0.05)])

    results <- as.data.frame((df)[which(rownames(df) %in% key),])
    return(results)
}

get_downregulated <- function(df){

    key <- intersect(rownames(df)[which(df$log2FoldChange<=-1)], rownames(df)[which(df$pvalue<=0.05)])

    results <- as.data.frame((df)[which(rownames(df) %in% key),])
    return(results)
}

up <- get_upregulated(res_df) down <- get_downregulated(res_df)

ADD COMMENT • link 3.6 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

Thank you Barry. With the above code I got 146 genes for up-regulated, and 190 genes for down-regulated. I am concerned because they are less than expected. Then I also tried these codes from a training website- DESeq2 workshop padj.cutoff <- 0.05. lfc.cutoff <- 1

res <- res %>%  data.frame() %>%  rownames_to_column(var="gene") %>%  as_tibble()

sig <- res %>% data.frame() %>% rownames_to_column(var="gene") %>% filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)

With the above code, I got 590 genes for up and down-regulated genes.
What is wrong here? Which one should I take? why?

Thanks.

ADD REPLY • link 3.6 years ago by Aynur ▴ 60

0

Entering edit mode

Hi Aynur,

The code I gave you filters by pvalue, not padj, it will return more genes.

The code example you provided did not filter by log2FoldChange when I tested it, I am not overly familiar with dplyr though..

If you want to replicate the results of summary(res) then pay attention to the output and replicate it step by step:

> summary(res)

out of 28785 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1502, 5.2%
LFC < 0 (down)     : 1349, 4.7%
outliers [1]       : 0, 0%
[1] see 'cooksCutoff' argument of ?results

It says it used LFC > 0 and padj < 0.05.

sig <- res_df[which(res_df$padj <= 0.05),]
up <- sig[which(sig$log2FoldChange > 0),]
nrow(up)
[1] 1502
:)

We need to change the LFC > 0 part to something biologically plausible, i.e +/- 1. This should explain why you 'got less genes than expected'.

ADD REPLY • link 8 months ago by Barry Digby ★ 1.3k

0

Entering edit mode

Thank you very much for your clarification. For the significantly up-regulated and down-regulated gene list, I also got N/As for padj, do I also need to filter them out for my final list or is it irrelevant? I want to validate those genes using RT-PCR, so I need a significantly differentially expressed gene list based on my log fold change cutoff.

Regards,

ADD REPLY • link 3.6 years ago by Aynur ▴ 60

1

Entering edit mode

You specifically want to evaluate genes with NAs in the wet lab? Don't.

Try this filtering method, it should help prevent NA's and spurious (very high/negative Log2FC) calls showing up in your up/down regulated results:

library(IHW)
library(apeglm)

res <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "tumor", "normal"))

> resultsNames(dds)
[1] "Intercept"                 "condition_tumor_vs_normal"  # the contrast im interested in is the second coef (resultsNames(dds)[2])


LFC <- lfcShrink(dds = dds, res= res, coef = 2, type = "apeglm") #coef=2 refers to the contrast 'tumor_vs_normal'
LFC_df <- as.data.frame(LFC)

You can now subset the LFC dataframe like in the previous comments.

IHW: Instead of finding a threshold on the mean of scaled counts that optimizes the number of rejected hypotheses following Benjamini-Hochberg correction, the IHW package finds an optimal weighting of the hypotheses that maximizes power while still controlling the FDR. - source

apeglm: in a nutshell, preserves 'true' LFC differences and removes genes that could have presented seemingly valid Log2FCs due to noise.

ADD REPLY • link 3.6 years ago by Barry Digby ★ 1.3k