Question: How to select differential expressed genes using DeSeq2?
2
gravatar for Aynur
3 days ago by
Aynur40
Aynur40 wrote:

Hello, I am new to bioinformatics, and I am trying to do DEG analysis using DESeq2. I followed the DESeq2 vignette, and I did this

res <- results(dds, lfcThreshold = log2(2), alpha = 0.05)

summary(res)

out of 17813 with nonzero total read count.                                                                                                                      
                        adjusted p-value < 0.05
LFC > 1.00 (up)    : 78, 0.44%
LFC < -1.00 (down) : 106, 0.6%
outliers [1]       : 0, 0%
low counts [2]     : 2073, 12%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

So, my question is, how do I get my up-regulated and down-regulated genes from this output? I am setting my log2fc=1 for the cutoff threshold. I also read this thread, but I still could not find the answer I am looking for. Thank you very much.

rna-seq R gene • 121 views
ADD COMMENTlink modified 3 days ago by genomax91k • written 3 days ago by Aynur40
2
gravatar for Barry Digby
3 days ago by
Barry Digby500
National University of Ireland, Galway
Barry Digby500 wrote:

You can convert the res object to a dataframe by simply calling res_df <- as.data.frame(res). Then apply filtering like these two functions do:

get_upregulated <- function(df){

    key <- intersect(rownames(df)[which(df$log2FoldChange>=1)], rownames(df)[which(df$pvalue<=0.05)])

    results <- as.data.frame((df)[which(rownames(df) %in% key),])
    return(results)
}

get_downregulated <- function(df){

    key <- intersect(rownames(df)[which(df$log2FoldChange<=-1)], rownames(df)[which(df$pvalue<=0.05)])

    results <- as.data.frame((df)[which(rownames(df) %in% key),])
    return(results)
}

up <- get_upregulated(res_df) down <- get_downregulated(res_df)

ADD COMMENTlink modified 3 days ago • written 3 days ago by Barry Digby500

Thank you Barry. With the above code I got 146 genes for up-regulated, and 190 genes for down-regulated. I am concerned because they are less than expected. Then I also tried these codes from a training website- DESeq2 workshop padj.cutoff <- 0.05. lfc.cutoff <- 1

res <- res %>%  data.frame() %>%  rownames_to_column(var="gene") %>%  as_tibble()

sig <- res %>% data.frame() %>% rownames_to_column(var="gene") %>% filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)

With the above code, I got 590 genes for up and down-regulated genes.
What is wrong here? Which one should I take? why?

Thanks.

ADD REPLYlink written 2 days ago by Aynur40

Hi Aynur,

The code I gave you filters by pvalue, not padj, it will return more genes.

Furthermore, the code example you provided does not filter by log2FoldChange as expected (at least when i tested it). It only filters by padj which is a rather egregious error.

If you want to replicate the results of summary(res) then just pay attention to the output and replicate it step by step:

> summary(res)

out of 28785 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1502, 5.2%
LFC < 0 (down)     : 1349, 4.7%
outliers [1]       : 0, 0%
[1] see 'cooksCutoff' argument of ?results

It says it used LFC > 0 and padj < 0.05.

sig <- res_df[which(res_df$padj <= 0.05),]
up <- sig[which(sig$log2FoldChange > 0),]
nrow(up)
[1] 1502
:)

Obviously we need to change the LFC > 0 part to something biologically plausible, i.e +/- 1. This should explain why you 'got less genes than expected'.

ADD REPLYlink modified 2 days ago • written 2 days ago by Barry Digby500

Thank you very much for your clarification. For the significantly up-regulated and down-regulated gene list, I also got N/As for padj, do I also need to filter them out for my final list or is it irrelevant? I want to validate those genes using RT-PCR, so I need a significantly differentially expressed gene list based on my log fold change cutoff.

Regards,

ADD REPLYlink written 2 days ago by Aynur40
1

You specifically want to evaluate genes with NAs in the wet lab? Don't.

Try this filtering method, it should help prevent NA's and spurious (very high/negative Log2FC) calls showing up in your up/down regulated results:

library(IHW)
library(apeglm)

res <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "tumor", "normal"))

> resultsNames(dds)
[1] "Intercept"                 "condition_tumor_vs_normal"  # the contrast im interested in is the second coef (resultsNames(dds)[2])


LFC <- lfcShrink(dds = dds, res= res, coef = 2, type = "apeglm") #coef=2 refers to the contrast 'tumor_vs_normal'
LFC_df <- as.data.frame(LFC)

You can now subset the LFC dataframe like in the previous comments.

IHW: Instead of finding a threshold on the mean of scaled counts that optimizes the number of rejected hypotheses following Benjamini-Hochberg correction, the IHW package finds an optimal weighting of the hypotheses that maximizes power while still controlling the FDR. - source

apeglm: in a nutshell, preserves 'true' LFC differences and removes genes that could have presented seemingly valid Log2FCs due to noise.

ADD REPLYlink modified 1 day ago • written 2 days ago by Barry Digby500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1960 users visited in the last hour