Question

Much less differentially expressed genes in DESeq2 than in EdgeR - is it common?

1

Entering edit mode

5.1 years ago

Tatiana ▴ 20

Hello!

The question is the following. I tried different parameters, but still my DESeq results are just a small subset of EdgeR results. Is it common and normal?

next-gen R • 1.4k views

ADD COMMENT • link updated 5.1 years ago by Kevin Blighe 89k • written 5.1 years ago by Tatiana ▴ 20

0

Entering edit mode

Hi Tatiana. There is not much that we can do or say without seeing the code that you have used, and without an understanding of the experimental setup. Can you share with us? - thank you.

ADD REPLY • link 5.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, Thank you for your reply. I used the usual RUVseq (https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf ) setup for two groups. Regarding experimental setup, what kind of information could I provide?

## Edger
yy <- read.table("For_ruvseq_paired_end_BC.txt", header = TRUE,  sep="\t", check.names = FALSE, row.names = 1)
yyfilter <- apply(yy, 1, function(x) length(x[x>5])>=2)
yyfiltered <- yy[yyfilter,]
yygenes <- rownames(yyfiltered)[grep("^Nfu", rownames(yyfiltered))]
yyx <- as.factor(rep(c("B", "C"), each=3))
yyset <- newSeqExpressionSet(as.matrix(yyfiltered),
                         phenoData = data.frame(yyx, row.names=colnames(yyfiltered)))
yyset <- betweenLaneNormalization(yyset, which="upper")

# empirical control. EdgeR
yydesign <- model.matrix(~yyx, data=pData(yyset))
yyy <- DGEList(counts=counts(yyset), group=yyx)
yyy <- calcNormFactors(yyy, method="upperquartile")
yyy <- estimateGLMCommonDisp(yyy, yydesign)
yyy <- estimateGLMTagwiseDisp(yyy, yydesign)
yyfit <- glmFit(yyy, yydesign)
yylrt <- glmLRT(yyfit, coef=2)
yytop <- topTags(yylrt, n=nrow(yyset))$table
empirical <- rownames(yyset)[which(!(rownames(yyset) %in% rownames(yytop)[1:5000]))]
yyset2 <- RUVg(yyset, empirical, k=1)
design <- model.matrix(~yyx + W_1, data=pData(yyset2))
yyy <- DGEList(counts=counts(yyset2), group=yyx)
yyy <- calcNormFactors(yyy, method="upperquartile")
yyy <- estimateGLMCommonDisp(yyy, design)
yyy <- estimateGLMTagwiseDisp(yyy, design)
yyfit <- glmFit(yyy, yydesign)
yylrt <- glmLRT(yyfit)
DEGs_BC <- as.data.frame(topTags(yylrt,  n=150, adjust.method="BH", sort.by="PValue", p.value=0.1))
CBresultsEdger_high <- subset(DEGs_BC, logFC > 1.5)
CBresultsEdger_low <- subset(DEGs_BC, logFC < -1.5)

##deseq2

dds <- DESeqDataSetFromMatrix(countData = counts(yyset2),
                          colData = pData(yyset2),
                          design = ~ W_1 + yyx)
dds <- DESeq(dds)
res <- results(dds)
summary(res)
deseq2res <- as.data.frame(res) 
setDT(deseq2res, keep.rownames = TRUE)[]
CBresultsDeseq <- subset(deseq2res, padj < 0.1)
CBresultsDeseq_high <- subset(CBresultsDeseq, log2FoldChange > 1.5)
CBresultsDeseq_low <- subset(CBresultsDeseq, log2FoldChange < -1.5)

ADD REPLY • link 5.1 years ago by Tatiana ▴ 20

score 3 · Answer 1 · 2020-05-16

Hey again, the issue, apparently, is the following: In your EdgeR code, you are specifying this formula:

~yyx + W_1

In your DESeq2 code, you are specifying this formula:

~ W_1 + yyx

...and, in neither case, when you derive test statistics, you are specifying the contrast / coefficient of interest. With multiple parameters in the design formula, you need to specifically select the parameter of interest, i.e., W_1 or yyx.

Both edgeR and DESeq2 will, by default, automatically select the last coefficient for you, i.e., for deriving test statistics. So, your EdgeR results are based on W_1, while your DESeq2 results are based on yyx.

For your DESeq2 code, you should take a look through the vignette ( http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html ), because there is also a key log fold-change shrinkage step that you are not using.

Kevin