Question: edgeR direction of expression and sign of Log Fold Changes
0
ilyco60 wrote:

Hi,

I used edgeR for differential expression analysis with 5 conditions relative to the baseline condition.

The design was a simple linear model with the condition factor variable re-ordered so that the baseline is the first value. The code was the following:

``````design <- model.matrix(~ condition, data = y\$samples)
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmFit(y,design)
conditionA_minus_base <- glmTreat(fit, coef = "conditionA", lfc = minlfc) ### coefficient corresponds to A - baseline
up_A <- rownames(y)[decideTestsDGE(conditionA_minus_base, p.value = pvalue, adj = "fdr") == 1]
down_A <- rownames(y)[decideTestsDGE(conditionA_minus_base, p.value = pvalue, adj = "fdr") == -1]
``````

However, when I checked the genes down-regulated, they are enriched for many terms which are known to be up-regulated. All in all, directions seems reversed for a large majority of genes. I checked labeling and pre-processing steps many times. Could you please let me know if the 1 and -1 values should be the other way around?

I tested two designs against each other:

``````design1 <- model.matrix(~ condition, data = y\$samples)
design2 <- model.matrix(~ 0 + condition, data = y\$samples)
``````

Results are the same from:

`````` conditionA_minus_base1 <- glmTreat(fit, coef = "conditionA", lfc = minlfc) ### coefficient corresponds to A - baseline
conditionA_minus_base2 <- glmTreat(fit, contrast = c(-1,1,0,0,0,0), lfc = minlfc)
``````

where contrast = c(-1,1,0,0,0,0) coresponds to -1Baseline + 1 Condition A

Thank you.

lfc edger rna-seq R • 2.2k views
modified 4.5 years ago by Gordon Smyth2.1k • written 4.6 years ago by ilyco60

As long as the ordering is correct then what you're doing should work. The most common mistake here is when making the `condition` column in `y\$samples`. Triple check that nothing is swapped there (hint: if you aren't already, load this from a text file).

0
Gordon Smyth2.1k wrote:

You code looks correct. Your `up_A` does contain genes up-regulated in condition A vs whatever you set for the reference level of 'condition', and `down_A` does correspond to down-regulated in condition A.