Dear All,
I am new to edgeR and still in the phase of reading the vignette in details to be able to use it for my data. I have a question in understanding the model.matrix. On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is defined as:
> targets
Sample Treat Time
1 Sample1 Placebo 0h
2 Sample2 Placebo 0h
3 Sample3 Placebo 1h
4 Sample4 Placebo 1h
5 Sample5 Placebo 2h
6 Sample6 Placebo 2h
7 Sample1 Drug 0h
8 Sample2 Drug 0h
9 Sample3 Drug 1h
10 Sample4 Drug 1h
11 Sample5 Drug 2h
12 Sample6 Drug 2h
targets$Treat <- relevel(targets$Treat, ref="Placebo")
design <- model.matrix(~Treat + Treat:Time, data=targets)
and the coefficient names are:
> colnames(design)
[1] "(Intercept)" "TreatDrug"
[3] "TreatPlacebo:Time1h" "TreatDrug:Time1h"
[5] "TreatPlacebo:Time2h" "TreatDrug:Time2h"
Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design formula looks like this (I added "2" in "design2" compared to original text for easier following):
> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets)
> colnames(design2)
[1] "(Intercept)" "TreatDrug"
[3] "Time1h" "Time2h"
[5] "TreatDrug:Time1h" "TreatDrug:Time2h"
It is explained that for the design2 (page 29 top): "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h contrasts, so that
> lrt <- glmLRT(fit, coef=5:6)
is useful because it detects genes that respond differently to the drug, relative to the placebo, at either of the times."
My question is, if I understood it well, in design2, why there are no coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and not: "
> lrt <- glmLRT(fit, coef=3)
and
> lrt <- glmLRT(fit, coef=4)
are the effects of the reference drug, i.e., the effects of the placebo at 1 hour and 2 hours" as it is written in the vignette text?
Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need to explore the influence of 3 factors with 2 levels each:
sex: f/m
disease_state:healthy/cancer
localization: blood/bones.
Question I want to answer: which genes are differentially expressed between 2 localisations in 2 disease states (i.e. are bones more severely affected by cancer than blood) taking into account different sex? I assume that my design formula should look like: design=~sex+disease+localization+disease:localization
Could anyone please tell me if the formula is correct? And, what should be the output? How could I know if the disease has different effects depending on the localization? By number of genes affected (=differentially expressed)?
I would appreciate very much if someone has some time to help me with any of the questions.
Best, Mike