Question

Interpreting edgeR DGE results - which condition is the logFC referring to?

0

Entering edit mode

2.4 years ago

chloe.vanderburg • 0

Basically what the title says - when interpreting edgeR results, which condition is the logFC referring to?

For example see screenshot below

Does this mean these genes are upregulated in condition A (vs B)?

example pic

logFC DGE edgeR • 1.3k views

ADD COMMENT • link updated 2.4 years ago by ATpoint 81k • written 2.4 years ago by chloe.vanderburg • 0

score 0 · Answer 1 · 2021-11-29

0

Entering edit mode

2.4 years ago

official.profile ▴ 20

Show your makeContrasts()

ADD COMMENT • link 2.4 years ago by official.profile ▴ 20

0

Entering edit mode

Sorry I don't know what this means. I used a perl script from Trinity to run edgeR

$TRINITY_HOME/Analysis/DifferentialExpression/run_DE_analysis.pl

ADD REPLY • link 2.4 years ago by chloe.vanderburg • 0

score 0 · Answer 2 · 2021-11-29

0

Entering edit mode

2.4 years ago

ATpoint 81k

Unless the script did something custom the reference level (so the denominator) is always the condition that alphabetically comes first. That is because these conditions are converted to factors and the factor levels are sorted alphabetically. So here it is probably conditionB/conditionA so a positive logFC means higher in B.

That is the relevant line:

https://github.com/trinityrnaseq/trinityrnaseq/blob/master/Analysis/DifferentialExpression/run_DE_analysis.pl#L603

Seems it is just standard settings, so A should be the reference (=denominator), I think. If that tool returns the normalized counts you can simply check for some genes whether a positive logFC means higher counts in samples of B rather than A.

ADD COMMENT • link 2.4 years ago by ATpoint 81k

0

Entering edit mode

Huh ok thanks, weirdly it seems to be the other way around! Positive logFC means higher in condition A.

Here is a screenshot of the DE subset file that is outputted automatically and also contains columns with the counts

DE-subset

ADD REPLY • link 2.4 years ago by chloe.vanderburg • 0

3

Entering edit mode

At least you have your answer :)

Just for context how it "normally looks", not sure what this script of yours is doing:

library(edgeR)

y <- DGEList(counts=matrix(rnbinom(5000*4,mu=5,size=2),5000,4), 
             group=rep(c("conditionA", "conditionB") ,each=2))
rownames(y) <- paste("gene", 1:nrow(y), sep="_")

design <- model.matrix(~group, y$samples)
v <- voom(y, design)
fit <- lmFit(v, design)
fit <- eBayes(fit, robust=TRUE)

tt <- topTable(fit, coef=2)  %>% data.frame(Gene=rownames(.), .)
cp <- cpm(y) %>% data.frame(Gene=rownames(.), .)

#/ 1/2 is "A" (=the reference) and 3/4 is "B", positive logFC mean higher in B,
merge(x=tt, y=cp, by="Gene")[2,c(1,2,8,9,10,11)]

ADD REPLY • link 2.4 years ago by ATpoint 81k