Question

edger model matrix design of linear model

2

Entering edit mode

5.9 years ago

druggable ▴ 60

Hi,

I am trying to do differential expression based on RNA-seq data. I only have one variable, which is the timepoint. I want to analyze gene expression relative to the 0 h time point. I already set up the contrasts. Now, I am trying to set up the design matrix. What is the difference between

design <- model.matrix(~timepoint)

and

design <- model.matrix(~0 + timepoint).

There is a big difference in the results.

edger design matrix differential expression • 6.4k views

ADD COMMENT • link 5.9 years ago by druggable ▴ 60

0

Entering edit mode

I don't see a difference. I could be wrong.

Why not try both and compare the results?

ADD REPLY • link 5.9 years ago by goodez ▴ 640

0

Entering edit mode

Hi, there is a big difference in the results. The results using the design <- model.matrix(~timepoint) makes much more biological sense.

ADD REPLY • link 5.9 years ago by druggable ▴ 60

0

Entering edit mode

Yes I don't think model.matrix(~0 + timepoint) makes much sense. How can you use zero as a term in your model to explain differences?

ADD REPLY • link 5.9 years ago by goodez ▴ 640

score 10 · Accepted Answer · 2018-06-14

The clue here is to look at the coefficients, or columns derived in the design matrix. Your non intercept model design, will create a coefficient that describes the difference between the two levels of your timepoint variable. Your non-intercept model will create a coefficient for each level. Here's an example using your non-intercept model:

library(tidyverse)
library(limma)
# Make a Phenotype Table
foo       <- data.frame(SampleType = paste0(rep(c("A","B"), each = 3)), 
                        Reps       = rep(1:3, 2)) %>%
             mutate(Names = paste0(SampleType,Reps))
# Make some fake gene expression data
genes.foo <- matrix(rnorm(6*100),ncol = 6) %>% 
             `rownames<-`(paste0("Gene_",1:100)) %>% 
             `colnames<-`(foo$Names)

#non-intercept model
model.foo <- model.matrix(~0 + SampleType, data = foo) %>% 
             `colnames<-`(gsub("SampleType","",colnames(.)))

model.foo
            A           B
1           1           0
2           1           0
3           1           0
4           0           1
5           0           1
6           0           1


conts.foo      <- c("A_Vs_B" = "A-B")
contmap.foo    <- makeContrasts(contrasts = conts.foo, levels = colnames(model.foo)) %>% 
                  `colnames<-`(names(conts.foo))
fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               contrasts.fit(contmap.foo) %>% 
               eBayes


> fit.foo$coefficients %>% head
        Contrasts
             A_Vs_B
  Gene_1 -0.6722254
  Gene_2 -1.1571439
  Gene_3 -0.1021150
  Gene_4  0.8687436
  Gene_5 -0.9600460
  Gene_6  1.3139228

topTable(fit.foo, coef = "A_Vs_B", number = Inf, p.value = 0.05, lfc = log2(1.5))

Alternatively, here's your intercept model

model.foo <- model.matrix(~SampleType, data = foo)

fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               eBayes

> model.foo
  (Intercept) SampleTypeB
1           1           0
2           1           0
3           1           0
4           1           1
5           1           1
6           1           1

# Here SampleTypeB is actually the difference relative to the first level (SampleTypeA)
> fit.foo$coefficients %>% head
       (Intercept) SampleTypeB
Gene_1  -0.2920748   0.6722254
Gene_2  -1.0356549   1.1571439
Gene_3   0.2477702   0.1021150
Gene_4   0.5016585  -0.8687436
Gene_5   0.3153554   0.9600460
Gene_6   0.2836491  -1.3139228

topTable(fit.foo, coef = "SampleTypeB", number = Inf, p.value = 0.05, lfc = log2(1.5))

edit: Changed the data frame initialisation.