EdgeR: upregulated with respect to what group?
2
0
Entering edit mode
20 months ago
bioinfo2345 ▴ 40

I have performed a differential expression analysis with edgeR using two groups: mutant and control.

In the definition of the experimental factors, I have put the control as the reference just like the Arabidopsis case study in the edgeR manual:

strain <- factor(substring(colnames(data.set),1,7))
strain <- relevel(strain, ref="control") 

Downstream in the edgeR analysis, I get a list of differentially expressed genes where some of them have a positive logFC value, indicating upregulation.

I naively think that since the reference is control, it must mean that the genes with a positive logFC value are upregulated in the mutant compared with the control.

This is the impression I get from reading the R documentation for the relevel() function and also the Arabidopsis case study in the edgeR manual. I have also checked a few genes with a positive logFC value against a TMM-normalized gene counts matrix and they have higher expression values in mutant than control.

However, I want to avoid making one of those catastrophic errors, so I wanted to ask just to be safe.

  1. Are these genes with positive logFC in this setup upregulated in the mutant or the control?

  2. Is this decided by which level is set as ref in the definition of experimental factors?

  3. Had the mutant been set as ref, would it have been the other way around?
upregulated edgeR • 899 views
ADD COMMENT
1
Entering edit mode
20 months ago
LChart 3.9k

The best way to verify that the logFC is going in the direction you expect is to take the most significant up-regulated and down-regulated genes and generate a boxplot. This can also help diagnose other errors, such as differential expression results driven consistently by 1 outlier sample.

More specifically, with R, when you pass in a factor into a formula (~ strain), the first level is dropped. When the linear model is fit, the resulting beta values are estimates of group averages (for levels 2, 3, ...) and the overall mean. In order to check if level 3 is "different" from level 1, you need to specify a contrast matrix that compares the level 3 mean to the level 1 mean. Maybe edgeR does this automatically for factors.

However, to be very sure, I would recommend running with a model that excludes an intercept: ~ strain + 0, in which case all levels are retained. This makes it easy to create a contrast of strainD-strainA by creating a +1/-1 contrast matrix: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7873980/

ADD COMMENT
1
Entering edit mode
10 weeks ago
Gordon Smyth ★ 7.0k
  1. Assuming you have used design <- model.matrix(~strain) then positive logFC means upregulated in mutant.
  2. Yes. It is always vs the reference level.
  3. Yes.
ADD COMMENT

Login before adding your answer.

Traffic: 1467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6