Question

Does edgeR supports random effects?

1

Entering edit mode

3 months ago

406339165 ▴ 10

Hi community,

I saw the recent publication of edgeR4, but after reading through the tutorial, I couldn’t find any description or example related to repeated measures or random effects.

Here’s my experimental design: 16 case samples vs 24 control samples; 21 females and 19 males, Samples are clustered by cage, which introduces a cage effect

I’m interested in testing the effect of Condition (case vs control) and Sex(F vs M), while accounting for the cage effect.

Does edgeR4 support modeling random effects or repeated measures in this context? If not, what would be the recommended approach for this type of analysis using edgeR4 or other packages?

Thanks in advance!

enter image description here

RNASeq voom randomeffects edgeR limma • 1.4k views

ADD COMMENT • link 3 months ago by 406339165 ▴ 10

1

Entering edit mode

This was asked before: https://support.bioconductor.org/p/115282/#115285

ADD REPLY • link 3 months ago by ATpoint 89k

0

Entering edit mode

Thank you!!

ADD REPLY • link 3 months ago by 406339165 ▴ 10

0

Entering edit mode

if you want to put data just put the data frame instead of pictures

ADD REPLY • link 3 months ago by 1769mkc ★ 1.3k

score 6 · Answer 1 · 2025-07-24

6

Entering edit mode

3 months ago

i.sudbery 22k

edgeR can model cage as a fixed effect, but not a random effect. As far as I'm aware, this is true of all the RNAseq modeling frameworks based on the negative binomial distribution.

Because your cage effect is perfectly correlated with your case/control status, you can't model you cage effect as a fixed effect.

While limma doesn't have the ability to analyse arbitrary mixed effects models, its duplicateCorrelation function can handle this particular design I think. You can use limma with RNAseq data through its voom transformation.

ADD COMMENT • link 3 months ago by i.sudbery 22k

0

Entering edit mode

Thank you!

ADD REPLY • link 3 months ago by 406339165 ▴ 10

0

Entering edit mode

Hi all,

Thank you for the helpful suggestions. In my dataset, I observed a significant effect of Sex, so I subset the data to include only male samples and reanalyzed using both edgeR and voom+limma. Technically, using duplicateCorrelation with voom worked without issues.

However, the results are puzzling:

(1) When I ignore the cage effect, edgeR reports approximately 1,200 genes with FDR < 0.1 for the Condition effect. ~ 150 genes with FDR<0.05.

(2) In contrast, voom+limma (both with and without accounting for the cage effect) returns no significant genes.

This discrepancy is confusing. Regardless of whether I model the cage effect, voom+limma always yields zero significant results, while edgeR shows many. I'm trying to understand whether this difference is due to the way random effects are handled, or if it reflects overfitting or inflated false positives in edgeR when cage is ignored, or limma+voom is not powerful in detecting the significance?

Any insights would be appreciated!

Code I used:

cds <- calcNormFactors(tmp)

If I do not consider cage effect, edgeR gives ~1200 genes with FDR<0.1, voom +limma gives 0 genes with FDR<0.1

edgeR

design <- model.matrix(~ 0+Condition, data = sampleInfo)
colnames(design) <- levels(cds$samples$group)
cds2 <- estimateDisp(cds, design, robust=TRUE)
fit <- glmQLFit(cds2, design, robust=TRUE)
lrt <- eval(parse(text = paste0("glmQLFTest(fit,contrast=makeContrasts(",case_name,"-",control_name,",levels=design))")))
sum(p.adjust(lrt$table$PValue, method = 'BH') <0.1)

voom

design <- model.matrix(~ 0 + Condition, data = sampleInfo)
colnames(design) <- levels(cds$samples$group)
v <- voom(cds, design)
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(
  Case_vs_Control = Case - Control,
  levels = design
)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
res <- topTable(fit2, coef = "Case_vs_Control", number = Inf)
sum(res$adj.P.Val<0.1)

If I consider cage effect, voom +limma gives 0 genes with FDR<0.1

voom

rm(fit2);rm(v)
corfit <- duplicateCorrelation(cds$counts, design, block = sampleInfo$cage)
v <- voom(cds, design, block = sampleInfo$cage, correlation = corfit$consensus)
fit <- lmFit(v, design, block = sampleInfo$cage, correlation = corfit$consensus)
cont.matrix <- makeContrasts(
  Case_vs_Control = Case - Control,
  levels = design
)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
res_con <- topTable(fit2, coef = "Case_vs_Control", number = Inf)
sum(res_con$adj.P.Val<0.1)

ADD REPLY • link 3 months ago by 406339165 ▴ 10

score 5 · Answer 2 · 2025-07-26

5

Entering edit mode

3 months ago

Gordon Smyth ★ 8.4k

limma and edgeR will give usually give very similar results if the same model is fitted in both packages. edgeR is often a bit more powerful, but only slightly so. See Baldoni et al (2025) for a comparison of the two pipelines.

Your voom code doesn't look quite correct. I would also keep all the data together rather than subsetting to males and females, especially if you want to estimate a random effect. limma and edgeR can estimate sex-specific condition effects without needing to subset the data. Here is suggested code:

Design matrix using all data. Here, dge is a DGEList:

dge <- normLibSizes(dge)
Condition <- relevel(sampleinfo$Condition, ref="control")
Sex <- sampleinfo$Sex
design <- model.matrix(~ Sex + Sex:Condition)

edgeR case vs control for females:

fitq <- glmQLFit(dge, design, robust=TRUE)
CasevsControl.F <- glmFTest(fitq, coef="SexF:Conditioncase")
summary(decideTests(CasevsControl.F, p=0.1))

edgeR case vs control for males:

CasevsControl.M <- glmFTest(fitq, coef="SexM:Conditioncase")
summary(decideTests(CasevsControl.M, p=0.1))

limma case vs control for females and males:

fitv <- voomLmFit(dge, design)
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

limma case vs control for females and males, with random effect for cage

fitv <- voomLmFit(dge, design, block=sampleinfo$cage)
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

Note there is no need for estimateDisp and that voomLmFit calls duplicateCorrelation automatically. Including cage as a random effect will almost certainly decrease the number of DE genes, because it will (correctly) decrease the effective number of independent samples.

Reference

Baldoni PL#, Chen L#, Li M, Chen Y, Smyth GK (2025). Dividing out quantification uncertainty enables assessment of differential transcript usage with diffSplice. bioRxiv. https://doi.org/10.1101/2025.04.07.647659

ADD COMMENT • link 3 months ago by Gordon Smyth ★ 8.4k

0

Entering edit mode

The code was super helpful — thank you so much, Gordon! (cc: Gordon Smyth)

I ran the code and got the following results, but I’m a bit confused, as my aim is testing the Condition effect, I am trying to figure out a correct way:

(1) When running limma case vs control for both females and males with random effect for cage, I observed 356 down- and 1293 up-regulated genes for the coefficient SexM:conditioncase.

(2) However, when subsetting to males only and running limma case vs control for males with random effect for cage, I got 604 down- and 100 up-regulated genes.

Although I’m testing the same genes in both cases, the results differ a lot. Could you help me understand why?

dge <- normLibSizes(dge)
condition <- relevel(sampleinfo$condition, ref="control")
Sex <- sampleinfo$Sex
design <- model.matrix(~ Sex + Sex:condition)

edgeR case vs control for females (ignore cage)

fitq <- glmQLFit(cds, design, robust=TRUE)
CasevsControl.F <- glmQLFTest(fitq, coef="SexF:conditioncase")
summary(decideTests(CasevsControl.F, p=0.1))

SexF:conditioncase
Down         2
NotSig   23475
Up           9

edgeR case vs control for males (ignore cage)

CasevsControl.M <- glmQLFTest(fitq, coef="SexM:conditioncase")
summary(decideTests(CasevsControl.M, p=0.1))

SexM:conditioncase
Down         515
NotSig   21501
Up         1470

limma case vs control for both sexes (ignore cage)

fitv <- voomLmFit(cds, design)
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

(Intercept)  SexM  SexF:conditioncase  SexM:conditioncase
Down        3121     5                 0                  161
NotSig      1419  23478             23486                22445
Up         18946     3                 0                  880

limma case vs control for both sexes, with random effect for cage

fitv <- voomLmFit(cds, design, block=sampleinfo$cage)
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

(Intercept)  SexM  SexF:conditioncase  SexM:conditioncase
Down        3176     57                0                  356
NotSig      1339  23365             23486                21837
Up         18971     64                0                 1293

limma case vs control for males only, with random effect for cage

sampleinfo2 <- sampleinfo[sampleinfo$Sex == 'M', ]
selected_samples <- sampleinfo2$Sample 
dge.sub <- dge[, selected_samples]  
sampleinfo2 <- sampleinfo2[match(selected_samples, sampleinfo2$Sample), ]
design.sub <- model.matrix(~ condition, data = sampleinfo2)
fitv <- voomLmFit(dge.sub, design.sub, block = sampleinfo2$cage)  
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

(Intercept)  conditioncontrol
Down        2829             604
NotSig      1674           22782
Up         18983             100

limma case vs control for females only, with random effect for cage

sampleinfo2 <- sampleinfo[sampleinfo$Sex == 'F', ]
selected_samples <- sampleinfo2$Sample 
dge.sub <- dge[, selected_samples]  
sampleinfo2 <- sampleinfo2[match(selected_samples, sampleinfo2$Sample), ]
design.sub <- model.matrix(~ condition, data = sampleinfo2)
fitv <- voomLmFit(dge.sub, design.sub, block = sampleinfo2$cage)  
fitv <- eBayes(fitv, robust=TRUE)
summary(decideTests(fitv, p=0.1))

(Intercept)  conditioncontrol
Down        3264                  0
NotSig      1428              23486
Up         18794                  0

ADD REPLY • link 3 months ago by 406339165 ▴ 10

1

Entering edit mode

Subsetting to half the data will obviously change the results. The mean-variance relationship and the inter-block correlation need to be reestimated using less data and the residual degrees of freedom are halved.

I already advised you not to subset.

To check the quality of the results, one would look at the MDS plots, voom plots, estimated correlation (should be positive), MD plots. Just counting DE genes doesn't give any insight.

Finally, it is easy to make mistakes when you subset. Your subsetting code includes a couple of steps that shouldn't be necessary, which does look a bit strange. If sampleinfo and dge have samples in the same order, then you just subset both on the same index. There's no need to match sample names, or to subset on sample names at all.

ADD REPLY • link 3 months ago by Gordon Smyth ★ 8.4k

0

Entering edit mode

Thank you so much Gordon Smyth !!

ADD REPLY • link 3 months ago by 406339165 ▴ 10