Question: Design and contrast matrices for including batch effect
gravatar for kris275
4 days ago by
kris2750 wrote:

Hello everyone,

I am working on a RNA-seq project that includes samples from normal tissue, adenomas, adenocarcinomas, liver metastases and was done in five batches. As I have previously read, it is best if batch effect is included in the design matrix for linear model fitting- shown in limma guide and ( However, in the latter article, when they write contrasts, the lanes are not included, not accounted for if i understand correctly. A similar story is shown in limma guide chapter 17.1.5, where the DEGs are for the final coefficient.

I performed the analysis in two different ways, the first as such:

group2<- factor(c("N","A","N","A","N","A","N","A","N","A",
                 "C","L","N","C","L","N","C","L","N","C","C","C",        "N","L","N","C","L","N","C","L","N","C","L","N","C","L","N","L","N","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","N","C","L", 
  "N","A","N","A","N","A","N","A","N","A","N","A","N","A","N","A"),levels = c("N","A","C","L"))

lane2 <- factor(c(1,1,1,1,1,1,1,1,1,1,

design <- model.matrix(~0+group2+lane2) 

cont.matrix <- makeContrasts(NvsA=N-A, 

The contrast matrix is then

Levels NvsA NvsC NvsL AvsC AvsL CvsL
    N     1    1    1    0    0    0
    A    -1    0    0    1    1    0
    C     0   -1    0   -1    0    1
    L     0    0   -1    0   -1   -1
    L2    0    0    0    0    0    0
    L3    0    0    0    0    0    0
    L4    0    0    0    0    0    0
    L5    0    0    0    0    0    0

and the results include DEGs for NvsA NvsC NvsL AvsC AvsL CvsL. If I understand correctly, this does not account for batch effect and is similar to what the rnaseq-123 article from the link above did?

The other option is similar to the answer to this question I used an interaction type design and averaged the cases from which batch each sample came from and only included the interactions that had a sample inside (otherwise linear model did not work as coefficients were not calculable):

designy1 <- model.matrix(~0+group2:lane2)
colnames(designy1) <- c("N1","A1","C1","L1","N2","A2","C2","L2","N3","A3","C3","L3","N4","A4","C4","L4","N5","A5","C5","L5")
designy2 <- designy1[,c(1:2,5,7,9,11:13,15:18)]

The contrast matrix then looked like this

contr.matrixy <- makeContrasts(NC= (N1+N2+N3+N4+N5)/5-(C2+C3+C4)/3, 
                               nVSa= (N1+N2+N3+N4+N5)/5-(A1+A5)/2, 
                               AVSC= (A1+A5)/2-(C2+C3+C4)/3,
                               NVL= (N1+N2+N3+N4+N5)/5-(L3+L4)/2,
                               CVL= (C2+C3+C4)/3-(L3+L4)/2,

I then as standard used voom normalisation, linear model fitting, etc to get to DEGs.

vy <- voom(countsbindlist4I,design = designy2, plot=TRUE)
fity <- lmFit(vy, designy2)
fit2y <-, contr.matrixy)
efity <- eBayes(fit2y)
dt<- decideTests(efity)

Is any of the solutions I used okay, or did I miss the mark with the second one? Thank you all for your help and insights.

Best regards,

ADD COMMENTlink modified 4 days ago by swbarnes29.4k • written 4 days ago by kris2750
gravatar for swbarnes2
4 days ago by
United States
swbarnes29.4k wrote:

Illumina lanes generally do not add technical variance, so you shouldn't include them in the design.

ADD COMMENTlink written 4 days ago by swbarnes29.4k

Dear swbarnes2,

thank you for your reply. In this case the experiment was done in different batches, the batches were included as "lanes". I could have clarified this better with another variable name, but the batch to batch variability is definitely there- visible with dendrograms and mds.

All help still wanted :)

Thank you all.

ADD REPLYlink written 3 days ago by kris2750
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2178 users visited in the last hour