Question: Clarity for proper design model matrix with paired sample analysis
gravatar for reskejak
18 months ago by
Michigan State University
reskejak20 wrote:

I have done a significant amount of reading over the past few days and am still uncertain about proper model matrix design for paired sample analysis. I am starting with a limma/voom object v that I would like to assess for transcriptional differences via RNA-seq between disease subtypes and their respective matched normal tissues. Thus I would like to model the effects of both Tissue and Subject.

Regarding the model matrix, I would assume to use design <- model.matrix(~0+Subject+Tissue). My understanding is that this would consider an intercept for Tissue, where Normal is always the reference, but there would be no reference Subject (due to no intercept for subject).

Note this approach elicits slightly different results than design <- model.matrix(~0+Tissue+Subject). The other alternative I have considered is design <- model.matrix(~Tissue+Subject) which considers an intercept for both Tissue and Subject. However, I don't want a reference Subject, so I think this design is not appropriate.

Here's my workflow in R:

# consider limma/voom object v

# make paired design matrix <- data.frame(c("S1.Disease1", "S1.Normal", 
                           "S2.Disease1", "S2.Normal", 
                           "S3.Disease1", "S3.Disease2", "S3.Normal",
                           "S4.Disease1", "S4.Disease2", "S4.Normal",
                           "S5.Disease1", "S5.Disease2", "S5.Normal",
                           "S6.Disease1", "S6.Normal",
                           "S7.Disease1", "S7.Normal",
                           "S8.Disease1", "S8.Normal",
                           "S9.Disease1", "S9.Normal"))
colnames( <- "Sample"$subject <- c(rep("S1", 2), 
                        rep("S2", 2), 
                        rep("S3", 3), 
                        rep("S4", 3), 
                        rep("S5", 3), 
                        rep("S6", 2), 
                        rep("S7", 2), 
                        rep("S8", 2), 
                        rep("S9", 2))$tissue <- c('Disease.subtype1', 'Normal',                     # S1
                       'Disease.subtype1', 'Normal',                     # S2
                       'Disease.subtype3', 'Disease.subtype1', 'Normal', # S3
                       'Disease.subtype3', 'Disease.subtype3', 'Normal', # S4
                       'Disease.subtype1', 'Disease.subtype1', 'Normal', # S5
                       'Disease.subtype2', 'Normal',                     # S6
                       'Disease.subtype2', 'Normal',                     # S7
                       'Disease.subtype1', 'Normal',                     # S8
                       'Disease.subtype1', 'Normal'                      # S9
Subject <- factor($subject)
Tissue <- factor($tissue, levels=c("Normal", "Disease.subtype1", "Disease.subtype2", "Disease.subtype3"))

# design model matrix
design <- model.matrix(~0+Subject+Tissue)
v$design <- design

# contrast matrix
contr.matrix <- makeContrasts(DiseaseSubtype1vsNormal = TissueDisease.subtype1,
                              DiseaseSubtype2vsNormal = TissueDisease.subtype2,
                              DiseaseSubtype3vsNormal = TissueDisease.subtype3,
                              levels = colnames(design))

# fit linear models etc.
vfit <- lmFit(v, v$design)
vfit <-, contrasts=contr.matrix)
efit <- eBayes(vfit)

# prepare DGE lists
DiesaseSubtype1.vs.Normal <- topTable(efit, coef=1, n=Inf)
DiesaseSubtype2.vs.Normal <- topTable(efit, coef=2, n=Inf)
DiesaseSubtype3.vs.Normal <- topTable(efit, coef=3, n=Inf)

# subset by FDR < 0.05 genes
DiesaseSubtype1.vs.Normal.sig <- subset(DiesaseSubtype1.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype2.vs.Normal.sig <- subset(DiesaseSubtype2.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype3.vs.Normal.sig <- subset(DiesaseSubtype3.vs.Normal, adj.P.Val < 0.05)

I would much appreciate any insight. Thanks!

statistics rna-seq limma design R • 727 views
ADD COMMENTlink modified 4 weeks ago by starick.marick0 • written 18 months ago by reskejak20

Hi. I need to perform DGE between paired samples overtine. Did you reach a conclusion about the best model matrix? design <- model.matrix(~0+Tissue+Subject) or design <- model.matrix(~Tissue+Subject)? I checked the Section 3.4.1 (Paired samples) of the edgeR user's guide package and it indicates that we I should use design <- model.matrix(~Tissue+Subject) for paired sampes, but I don't know if is the best model to perform DGE between paired samples overtine.

ADD REPLYlink written 4 weeks ago by starick.marick0

This question/answer on biostars might be helpful.

ADD REPLYlink written 4 weeks ago by microbiotaiota20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 799 users visited in the last hour