Much less differentially expressed genes in DESeq2 than in EdgeR - is it common?
1
1
Entering edit mode
3.9 years ago
Tatiana ▴ 10

Hello!

The question is the following. I tried different parameters, but still my DESeq results are just a small subset of EdgeR results. Is it common and normal?

next-gen R • 1.1k views
ADD COMMENT
0
Entering edit mode

Hi Tatiana. There is not much that we can do or say without seeing the code that you have used, and without an understanding of the experimental setup. Can you share with us? - thank you.

ADD REPLY
0
Entering edit mode

Hi Kevin, Thank you for your reply. I used the usual RUVseq (https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf ) setup for two groups. Regarding experimental setup, what kind of information could I provide?

## Edger
yy <- read.table("For_ruvseq_paired_end_BC.txt", header = TRUE,  sep="\t", check.names = FALSE, row.names = 1)
yyfilter <- apply(yy, 1, function(x) length(x[x>5])>=2)
yyfiltered <- yy[yyfilter,]
yygenes <- rownames(yyfiltered)[grep("^Nfu", rownames(yyfiltered))]
yyx <- as.factor(rep(c("B", "C"), each=3))
yyset <- newSeqExpressionSet(as.matrix(yyfiltered),
                         phenoData = data.frame(yyx, row.names=colnames(yyfiltered)))
yyset <- betweenLaneNormalization(yyset, which="upper")

# empirical control. EdgeR
yydesign <- model.matrix(~yyx, data=pData(yyset))
yyy <- DGEList(counts=counts(yyset), group=yyx)
yyy <- calcNormFactors(yyy, method="upperquartile")
yyy <- estimateGLMCommonDisp(yyy, yydesign)
yyy <- estimateGLMTagwiseDisp(yyy, yydesign)
yyfit <- glmFit(yyy, yydesign)
yylrt <- glmLRT(yyfit, coef=2)
yytop <- topTags(yylrt, n=nrow(yyset))$table
empirical <- rownames(yyset)[which(!(rownames(yyset) %in% rownames(yytop)[1:5000]))]
yyset2 <- RUVg(yyset, empirical, k=1)
design <- model.matrix(~yyx + W_1, data=pData(yyset2))
yyy <- DGEList(counts=counts(yyset2), group=yyx)
yyy <- calcNormFactors(yyy, method="upperquartile")
yyy <- estimateGLMCommonDisp(yyy, design)
yyy <- estimateGLMTagwiseDisp(yyy, design)
yyfit <- glmFit(yyy, yydesign)
yylrt <- glmLRT(yyfit)
DEGs_BC <- as.data.frame(topTags(yylrt,  n=150, adjust.method="BH", sort.by="PValue", p.value=0.1))
CBresultsEdger_high <- subset(DEGs_BC, logFC > 1.5)
CBresultsEdger_low <- subset(DEGs_BC, logFC < -1.5)

##deseq2

dds <- DESeqDataSetFromMatrix(countData = counts(yyset2),
                          colData = pData(yyset2),
                          design = ~ W_1 + yyx)
dds <- DESeq(dds)
res <- results(dds)
summary(res)
deseq2res <- as.data.frame(res) 
setDT(deseq2res, keep.rownames = TRUE)[]
CBresultsDeseq <- subset(deseq2res, padj < 0.1)
CBresultsDeseq_high <- subset(CBresultsDeseq, log2FoldChange > 1.5)
CBresultsDeseq_low <- subset(CBresultsDeseq, log2FoldChange < -1.5)
ADD REPLY
3
Entering edit mode
3.9 years ago

Hey again, the issue, apparently, is the following: In your EdgeR code, you are specifying this formula:

~yyx + W_1

In your DESeq2 code, you are specifying this formula:

~ W_1 + yyx

...and, in neither case, when you derive test statistics, you are specifying the contrast / coefficient of interest. With multiple parameters in the design formula, you need to specifically select the parameter of interest, i.e., W_1 or yyx.

Both edgeR and DESeq2 will, by default, automatically select the last coefficient for you, i.e., for deriving test statistics. So, your EdgeR results are based on W_1, while your DESeq2 results are based on yyx.

For your DESeq2 code, you should take a look through the vignette ( http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html ), because there is also a key log fold-change shrinkage step that you are not using.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6