Question: (Closed) Multifactorial Design Formula In Edger
5.6 years ago by
mike.bioc320 wrote:

Dear All,

I am new to edgeR and still in the phase of reading the vignette in details to be able to use it for my data. I have a question in understanding the model.matrix. On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is defined as:

``````> targets
Sample Treat Time
1 Sample1 Placebo 0h
2 Sample2 Placebo 0h
3 Sample3 Placebo 1h
4 Sample4 Placebo 1h
5 Sample5 Placebo 2h
6 Sample6 Placebo 2h
7 Sample1 Drug 0h
8 Sample2 Drug 0h
9 Sample3 Drug 1h
10 Sample4 Drug 1h
11 Sample5 Drug 2h
12 Sample6 Drug 2h

targets\$Treat <- relevel(targets\$Treat, ref="Placebo")

design <- model.matrix(~Treat + Treat:Time, data=targets)
``````

and the coefficient names are:

``````> colnames(design)
[1] "(Intercept)" "TreatDrug"
[3] "TreatPlacebo:Time1h" "TreatDrug:Time1h"
[5] "TreatPlacebo:Time2h" "TreatDrug:Time2h"
``````

Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design formula looks like this (I added "2" in "design2" compared to original text for easier following):

``````> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets)
> colnames(design2)
[1] "(Intercept)" "TreatDrug"
[3] "Time1h" "Time2h"
[5] "TreatDrug:Time1h" "TreatDrug:Time2h"
``````

It is explained that for the design2 (page 29 top): "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h contrasts, so that

``````> lrt <- glmLRT(fit, coef=5:6)
``````

is useful because it detects genes that respond differently to the drug, relative to the placebo, at either of the times."

My question is, if I understood it well, in design2, why there are no coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and not: "

``````> lrt <- glmLRT(fit, coef=3)
``````

and

``````> lrt <- glmLRT(fit, coef=4)
``````

are the effects of the reference drug, i.e., the effects of the placebo at 1 hour and 2 hours" as it is written in the vignette text?

Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need to explore the influence of 3 factors with 2 levels each:

1. sex: f/m

2. disease_state:healthy/cancer

3. localization: blood/bones.

Question I want to answer: which genes are differentially expressed between 2 localisations in 2 disease states (i.e. are bones more severely affected by cancer than blood) taking into account different sex? I assume that my design formula should look like: design=~sex+disease+localization+disease:localization

Could anyone please tell me if the formula is correct? And, what should be the output? How could I know if the disease has different effects depending on the localization? By number of genes affected (=differentially expressed)?

I would appreciate very much if someone has some time to help me with any of the questions.

Best, Mike

edger • 3.5k views
modified 3.0 years ago by Biostar ♦♦ 20 • written 5.6 years ago by mike.bioc320

