Question

Clarity for proper design model matrix with paired sample analysis

0

Entering edit mode

5.0 years ago

reskejak ▴ 40

I have done a significant amount of reading over the past few days and am still uncertain about proper model matrix design for paired sample analysis. I am starting with a limma/voom object v that I would like to assess for transcriptional differences via RNA-seq between disease subtypes and their respective matched normal tissues. Thus I would like to model the effects of both Tissue and Subject.

Regarding the model matrix, I would assume to use design <- model.matrix(~0+Subject+Tissue). My understanding is that this would consider an intercept for Tissue, where Normal is always the reference, but there would be no reference Subject (due to no intercept for subject).

Note this approach elicits slightly different results than design <- model.matrix(~0+Tissue+Subject). The other alternative I have considered is design <- model.matrix(~Tissue+Subject) which considers an intercept for both Tissue and Subject. However, I don't want a reference Subject, so I think this design is not appropriate.

Here's my workflow in R:

# consider limma/voom object v

# make paired design matrix
exp.design <- data.frame(c("S1.Disease1", "S1.Normal", 
                           "S2.Disease1", "S2.Normal", 
                           "S3.Disease1", "S3.Disease2", "S3.Normal",
                           "S4.Disease1", "S4.Disease2", "S4.Normal",
                           "S5.Disease1", "S5.Disease2", "S5.Normal",
                           "S6.Disease1", "S6.Normal",
                           "S7.Disease1", "S7.Normal",
                           "S8.Disease1", "S8.Normal",
                           "S9.Disease1", "S9.Normal"))
colnames(exp.design) <- "Sample"
exp.design$subject <- c(rep("S1", 2), 
                        rep("S2", 2), 
                        rep("S3", 3), 
                        rep("S4", 3), 
                        rep("S5", 3), 
                        rep("S6", 2), 
                        rep("S7", 2), 
                        rep("S8", 2), 
                        rep("S9", 2))
exp.design$tissue <- c('Disease.subtype1', 'Normal',                     # S1
                       'Disease.subtype1', 'Normal',                     # S2
                       'Disease.subtype3', 'Disease.subtype1', 'Normal', # S3
                       'Disease.subtype3', 'Disease.subtype3', 'Normal', # S4
                       'Disease.subtype1', 'Disease.subtype1', 'Normal', # S5
                       'Disease.subtype2', 'Normal',                     # S6
                       'Disease.subtype2', 'Normal',                     # S7
                       'Disease.subtype1', 'Normal',                     # S8
                       'Disease.subtype1', 'Normal'                      # S9
                       )
Subject <- factor(exp.design$subject)
Tissue <- factor(exp.design$tissue, levels=c("Normal", "Disease.subtype1", "Disease.subtype2", "Disease.subtype3"))

# design model matrix
design <- model.matrix(~0+Subject+Tissue)
v$design <- design

# contrast matrix
contr.matrix <- makeContrasts(DiseaseSubtype1vsNormal = TissueDisease.subtype1,
                              DiseaseSubtype2vsNormal = TissueDisease.subtype2,
                              DiseaseSubtype3vsNormal = TissueDisease.subtype3,
                              levels = colnames(design))

# fit linear models etc.
vfit <- lmFit(v, v$design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)

# prepare DGE lists
DiesaseSubtype1.vs.Normal <- topTable(efit, coef=1, n=Inf)
DiesaseSubtype2.vs.Normal <- topTable(efit, coef=2, n=Inf)
DiesaseSubtype3.vs.Normal <- topTable(efit, coef=3, n=Inf)

# subset by FDR < 0.05 genes
DiesaseSubtype1.vs.Normal.sig <- subset(DiesaseSubtype1.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype2.vs.Normal.sig <- subset(DiesaseSubtype2.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype3.vs.Normal.sig <- subset(DiesaseSubtype3.vs.Normal, adj.P.Val < 0.05)

I would much appreciate any insight. Thanks!

RNA-seq limma R design statistics • 2.8k views

ADD COMMENT • link updated 3.6 years ago by starick.marick ▴ 10 • written 5.0 years ago by reskejak ▴ 40

0

Entering edit mode

Hi. I need to perform DGE between paired samples overtine. Did you reach a conclusion about the best model matrix? design <- model.matrix(~0+Tissue+Subject) or design <- model.matrix(~Tissue+Subject)? I checked the Section 3.4.1 (Paired samples) of the edgeR user's guide package and it indicates that we I should use design <- model.matrix(~Tissue+Subject) for paired sampes, but I don't know if is the best model to perform DGE between paired samples overtine.