Clarity for proper design model matrix with paired sample analysis
Entering edit mode
3.6 years ago
reskejak ▴ 40

I have done a significant amount of reading over the past few days and am still uncertain about proper model matrix design for paired sample analysis. I am starting with a limma/voom object v that I would like to assess for transcriptional differences via RNA-seq between disease subtypes and their respective matched normal tissues. Thus I would like to model the effects of both Tissue and Subject.

Regarding the model matrix, I would assume to use design <- model.matrix(~0+Subject+Tissue). My understanding is that this would consider an intercept for Tissue, where Normal is always the reference, but there would be no reference Subject (due to no intercept for subject).

Note this approach elicits slightly different results than design <- model.matrix(~0+Tissue+Subject). The other alternative I have considered is design <- model.matrix(~Tissue+Subject) which considers an intercept for both Tissue and Subject. However, I don't want a reference Subject, so I think this design is not appropriate.

Here's my workflow in R:

# consider limma/voom object v

# make paired design matrix <- data.frame(c("S1.Disease1", "S1.Normal", 
                           "S2.Disease1", "S2.Normal", 
                           "S3.Disease1", "S3.Disease2", "S3.Normal",
                           "S4.Disease1", "S4.Disease2", "S4.Normal",
                           "S5.Disease1", "S5.Disease2", "S5.Normal",
                           "S6.Disease1", "S6.Normal",
                           "S7.Disease1", "S7.Normal",
                           "S8.Disease1", "S8.Normal",
                           "S9.Disease1", "S9.Normal"))
colnames( <- "Sample"$subject <- c(rep("S1", 2), 
                        rep("S2", 2), 
                        rep("S3", 3), 
                        rep("S4", 3), 
                        rep("S5", 3), 
                        rep("S6", 2), 
                        rep("S7", 2), 
                        rep("S8", 2), 
                        rep("S9", 2))$tissue <- c('Disease.subtype1', 'Normal',                     # S1
                       'Disease.subtype1', 'Normal',                     # S2
                       'Disease.subtype3', 'Disease.subtype1', 'Normal', # S3
                       'Disease.subtype3', 'Disease.subtype3', 'Normal', # S4
                       'Disease.subtype1', 'Disease.subtype1', 'Normal', # S5
                       'Disease.subtype2', 'Normal',                     # S6
                       'Disease.subtype2', 'Normal',                     # S7
                       'Disease.subtype1', 'Normal',                     # S8
                       'Disease.subtype1', 'Normal'                      # S9
Subject <- factor($subject)
Tissue <- factor($tissue, levels=c("Normal", "Disease.subtype1", "Disease.subtype2", "Disease.subtype3"))

# design model matrix
design <- model.matrix(~0+Subject+Tissue)
v$design <- design

# contrast matrix
contr.matrix <- makeContrasts(DiseaseSubtype1vsNormal = TissueDisease.subtype1,
                              DiseaseSubtype2vsNormal = TissueDisease.subtype2,
                              DiseaseSubtype3vsNormal = TissueDisease.subtype3,
                              levels = colnames(design))

# fit linear models etc.
vfit <- lmFit(v, v$design)
vfit <-, contrasts=contr.matrix)
efit <- eBayes(vfit)

# prepare DGE lists
DiesaseSubtype1.vs.Normal <- topTable(efit, coef=1, n=Inf)
DiesaseSubtype2.vs.Normal <- topTable(efit, coef=2, n=Inf)
DiesaseSubtype3.vs.Normal <- topTable(efit, coef=3, n=Inf)

# subset by FDR < 0.05 genes
DiesaseSubtype1.vs.Normal.sig <- subset(DiesaseSubtype1.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype2.vs.Normal.sig <- subset(DiesaseSubtype2.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype3.vs.Normal.sig <- subset(DiesaseSubtype3.vs.Normal, adj.P.Val < 0.05)

I would much appreciate any insight. Thanks!

RNA-seq limma R design statistics • 1.9k views
Entering edit mode

Hi. I need to perform DGE between paired samples overtine. Did you reach a conclusion about the best model matrix? design <- model.matrix(~0+Tissue+Subject) or design <- model.matrix(~Tissue+Subject)? I checked the Section 3.4.1 (Paired samples) of the edgeR user's guide package and it indicates that we I should use design <- model.matrix(~Tissue+Subject) for paired sampes, but I don't know if is the best model to perform DGE between paired samples overtine.

Entering edit mode

This question/answer on biostars might be helpful.


Login before adding your answer.

Traffic: 1403 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6