I am new to edgeR and still in the phase of reading the vignette in details to be able to use it for my data. I have a question in understanding the model.matrix. On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is defined as:
> targets Sample Treat Time 1 Sample1 Placebo 0h 2 Sample2 Placebo 0h 3 Sample3 Placebo 1h 4 Sample4 Placebo 1h 5 Sample5 Placebo 2h 6 Sample6 Placebo 2h 7 Sample1 Drug 0h 8 Sample2 Drug 0h 9 Sample3 Drug 1h 10 Sample4 Drug 1h 11 Sample5 Drug 2h 12 Sample6 Drug 2h targets$Treat <- relevel(targets$Treat, ref="Placebo") design <- model.matrix(~Treat + Treat:Time, data=targets)
and the coefficient names are:
> colnames(design)  "(Intercept)" "TreatDrug"  "TreatPlacebo:Time1h" "TreatDrug:Time1h"  "TreatPlacebo:Time2h" "TreatDrug:Time2h"
Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design formula looks like this (I added "2" in "design2" compared to original text for easier following):
> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets) > colnames(design2)  "(Intercept)" "TreatDrug"  "Time1h" "Time2h"  "TreatDrug:Time1h" "TreatDrug:Time2h"
It is explained that for the design2 (page 29 top): "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h contrasts, so that
> lrt <- glmLRT(fit, coef=5:6)
is useful because it detects genes that respond differently to the drug, relative to the placebo, at either of the times."
My question is, if I understood it well, in design2, why there are no coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and not: "
> lrt <- glmLRT(fit, coef=3)
> lrt <- glmLRT(fit, coef=4)
are the effects of the reference drug, i.e., the effects of the placebo at 1 hour and 2 hours" as it is written in the vignette text?
Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need to explore the influence of 3 factors with 2 levels each:
Question I want to answer: which genes are differentially expressed between 2 localisations in 2 disease states (i.e. are bones more severely affected by cancer than blood) taking into account different sex? I assume that my design formula should look like: design=~sex+disease+localization+disease:localization
Could anyone please tell me if the formula is correct? And, what should be the output? How could I know if the disease has different effects depending on the localization? By number of genes affected (=differentially expressed)?
I would appreciate very much if someone has some time to help me with any of the questions.