Question: (Closed) Multifactorial Design Formula In Edger
gravatar for mike.bioc32
7.0 years ago by
mike.bioc320 wrote:

Dear All,

I am new to edgeR and still in the phase of reading the vignette in details to be able to use it for my data. I have a question in understanding the model.matrix. On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is defined as:

> targets
Sample Treat Time
1 Sample1 Placebo 0h
2 Sample2 Placebo 0h
3 Sample3 Placebo 1h
4 Sample4 Placebo 1h
5 Sample5 Placebo 2h
6 Sample6 Placebo 2h
7 Sample1 Drug 0h
8 Sample2 Drug 0h
9 Sample3 Drug 1h
10 Sample4 Drug 1h
11 Sample5 Drug 2h
12 Sample6 Drug 2h

targets$Treat <- relevel(targets$Treat, ref="Placebo")

design <- model.matrix(~Treat + Treat:Time, data=targets)

and the coefficient names are:

> colnames(design)
[1] "(Intercept)" "TreatDrug"
[3] "TreatPlacebo:Time1h" "TreatDrug:Time1h"
[5] "TreatPlacebo:Time2h" "TreatDrug:Time2h"

Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design formula looks like this (I added "2" in "design2" compared to original text for easier following):

> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets)
> colnames(design2)
[1] "(Intercept)" "TreatDrug"
[3] "Time1h" "Time2h"
[5] "TreatDrug:Time1h" "TreatDrug:Time2h"

It is explained that for the design2 (page 29 top): "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h contrasts, so that

> lrt <- glmLRT(fit, coef=5:6)

is useful because it detects genes that respond differently to the drug, relative to the placebo, at either of the times."

My question is, if I understood it well, in design2, why there are no coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and not: "

> lrt <- glmLRT(fit, coef=3)


> lrt <- glmLRT(fit, coef=4)

are the effects of the reference drug, i.e., the effects of the placebo at 1 hour and 2 hours" as it is written in the vignette text?

Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need to explore the influence of 3 factors with 2 levels each:

  1. sex: f/m

  2. disease_state:healthy/cancer

  3. localization: blood/bones.

Question I want to answer: which genes are differentially expressed between 2 localisations in 2 disease states (i.e. are bones more severely affected by cancer than blood) taking into account different sex? I assume that my design formula should look like: design=~sex+disease+localization+disease:localization

Could anyone please tell me if the formula is correct? And, what should be the output? How could I know if the disease has different effects depending on the localization? By number of genes affected (=differentially expressed)?

I would appreciate very much if someone has some time to help me with any of the questions.

Best, Mike

edger • 4.4k views
ADD COMMENTlink modified 4.4 years ago by Biostar ♦♦ 20 • written 7.0 years ago by mike.bioc320

Questions relating to specific R packages are usually best answered using the list serve, but I think especially in the case of edgeR (at least in my humble opinion). See the link below. You can post from your email or via the link.

ADD REPLYlink written 7.0 years ago by Jason900

Hello mike.bioc32!

We believe that this post does not fit the main topic of this site.

Should be asked on support bioconductor

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.


ADD REPLYlink written 3.6 years ago by Michael Dondrup48k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour