Question

Design matrix for microarray experiment with covariates, cofactors, multiple factors, and one treatment?

0

Entering edit mode

21 months ago

bkleiboeker ▴ 370

Hi all, I am working to analyze some microarray data (GSE116560) with hopes of comparing gene expression between patients on a ventilator versus those not on a ventilator for three different timepoints (ARDS diagnosis day 1, 4, or 8). My experimental matrix can be represented by design:

set.seed(123)
design<-data.frame("id"=1:68,"ARDS_day"=c(rep(1,30),rep(4,22),rep(8,16)),"age"=sample(18:84,replace= 
T,size = 68),"gender"=rep(c("male","female"),68/2),"vfd_days"=rep(c(0,10,0,25),17))
design$vent_status<-ifelse(design$vfd_days==0,"on_vent","off_vent")
design$condition<-paste0(design$vent_status,"_",design$ARDS_day)

design<-model.matrix(~age+gender+condition,design)
design

then I will fit the linear model and run emperical bayes as follows:

fit <- lmFit(y,design)
cont.matrix <- makeContrasts(day4_on_vs_off_vent = conditionon_vent_4 - conditionoff_vent_4,
                         levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
topTable(fit2, adjust="BH", n = Inf)

I am quite confused however. If anyone could help me with these questions I'd be greatly appreciative!

Should I use an intercept ("model.matrix(~1+age+gender+...")? I understand that this is usually best for numeric covariates (e.g. age) and shouldn't matter for gender, but feel unsure what I should do if I have both types.
How do I make the appropriate contrast for on vent vs. off vent for ARDS day1? Design has a column "conditionon_vent_1" but no column "conditionoff_vent_1." Does this mean "conditionoff_vent_1" is represented by (Intercept)?

I have been looking at the limma userguide extensively and still feel confused. Thanks for any guidance!

Brian

mircoarray R matrix design limma • 880 views

ADD COMMENT • link updated 21 months ago by LChart 3.9k • written 21 months ago by bkleiboeker ▴ 370

score 1 · Answer 1 · 2022-07-20

Should I use an intercept ("model.matrix(~1+age+gender+...")? I understand that this is usually best for numeric covariates (e.g. age) and shouldn't matter for gender, but feel unsure what I should do if I have both types.

In this case it doesn't matter -- you have a categorial covariate (gender) in the model. If you don't include an intercept, R will encode this as a vector (0, 0, 0, 1, 1, 1) for male and (1, 1, 1, 0, 0, 0) for female. If you include an intercept, R will encode only (1, 1, 1, 0, 0, 0) for female (assuming the factor has alphabetical levels). In other words, R will either give you a model with BOTH indicators (no intercept) or only ONE indicator (if you include an intercept).

Your condition is also categorial, which also triggers this logic.

How do I make the appropriate contrast for on vent vs. off vent for ARDS day1?

Because your variable of interest was specified last, and there are multiple categorial variables, it had its first level dropped to avoid multicollinearity. To specify contrasts for a variable of interest always specify the variable of interest first. Compare:

> dat <- data.frame(
    +   gender=c('m', 'f', 'm', 'f', 'm', 'f', 'm', 'f', 'm', 'f'),
    +   condition=c('a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a')
    + )
> 
> model.matrix(~ condition + gender + 0, dat)
   conditiona conditionb conditionc genderm
1           1          0          0       1
2           0          1          0       0
3           0          0          1       1
4           1          0          0       0
5           0          1          0       1
6           0          0          1       0
7           1          0          0       1
8           0          1          0       0
9           0          0          1       1
10          1          0          0       0
> 
> model.matrix(~ gender + condition + 0, dat)
 genderf genderm conditionb conditionc
1        0       1          0          0
2        1       0          1          0
3        0       1          0          1
4        1       0          0          0
5        0       1          1          0
6        1       0          0          1
7        0       1          0          0
8        1       0          1          0
9        0       1          0          1
10       1       0          0          0

Does this mean "conditionoff_vent_1" is represented by (Intercept)?

In a way yes, and in a way no. You can certainly get the effect by contrasting against the intercept, but the algebra is slightly different as in this case the model matrix looks like

> model.matrix(~ condition + gender + 1, dat)
   (Intercept) conditionb conditionc genderm
1            1          0          0       1
2            1          1          0       0
3            1          0          1       1
4            1          0          0       0
5            1          1          0       1
6            1          0          1       0
7            1          0          0       1
8            1          1          0       0
9            1          0          1       1
10           1          0          0       0

If you want to contrast conditiona and conditionb you need to specify an equation that, when applied to the above indicators, would give you 1 -1 0 1 -1 0 1 -1 0 1 (the encoding of conditiona- the encoding of conditionb).