Question

EdgeR: upregulated with respect to what group?

0

Entering edit mode

21 months ago

bioinfo2345 ▴ 40

I have performed a differential expression analysis with edgeR using two groups: mutant and control.

In the definition of the experimental factors, I have put the control as the reference just like the Arabidopsis case study in the edgeR manual:

strain <- factor(substring(colnames(data.set),1,7))
strain <- relevel(strain, ref="control")

Downstream in the edgeR analysis, I get a list of differentially expressed genes where some of them have a positive logFC value, indicating upregulation.

I naively think that since the reference is control, it must mean that the genes with a positive logFC value are upregulated in the mutant compared with the control.

This is the impression I get from reading the R documentation for the relevel() function and also the Arabidopsis case study in the edgeR manual. I have also checked a few genes with a positive logFC value against a TMM-normalized gene counts matrix and they have higher expression values in mutant than control.

However, I want to avoid making one of those catastrophic errors, so I wanted to ask just to be safe.

Are these genes with positive logFC in this setup upregulated in the mutant or the control?
Is this decided by which level is set as ref in the definition of experimental factors?
Had the mutant been set as ref, would it have been the other way around?

upregulated edgeR • 920 views

ADD COMMENT • link updated 12 weeks ago by Gordon Smyth ★ 7.1k • written 21 months ago by bioinfo2345 ▴ 40

score 1 · Answer 1 · 2022-08-11

The best way to verify that the logFC is going in the direction you expect is to take the most significant up-regulated and down-regulated genes and generate a boxplot. This can also help diagnose other errors, such as differential expression results driven consistently by 1 outlier sample.

More specifically, with R, when you pass in a factor into a formula (~ strain), the first level is dropped. When the linear model is fit, the resulting beta values are estimates of group averages (for levels 2, 3, ...) and the overall mean. In order to check if level 3 is "different" from level 1, you need to specify a contrast matrix that compares the level 3 mean to the level 1 mean. Maybe edgeR does this automatically for factors.

However, to be very sure, I would recommend running with a model that excludes an intercept: ~ strain + 0, in which case all levels are retained. This makes it easy to create a contrast of strainD-strainA by creating a +1/-1 contrast matrix: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7873980/

score 1 · Answer 2 · 2024-02-03

1

Entering edit mode

12 weeks ago

Gordon Smyth ★ 7.1k

Assuming you have used design <- model.matrix(~strain) then positive logFC means upregulated in mutant.
Yes. It is always vs the reference level.
Yes.

ADD COMMENT • link 12 weeks ago by Gordon Smyth ★ 7.1k