Question: Design and contrast matrices for including batch effect
0
gravatar for kris275
4 days ago by
kris2750
Slovenia
kris2750 wrote:

Hello everyone,

I am working on a RNA-seq project that includes samples from normal tissue, adenomas, adenocarcinomas, liver metastases and was done in five batches. As I have previously read, it is best if batch effect is included in the design matrix for linear model fitting- shown in limma guide and (http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html). However, in the latter article, when they write contrasts, the lanes are not included, not accounted for if i understand correctly. A similar story is shown in limma guide chapter 17.1.5, where the DEGs are for the final coefficient.

I performed the analysis in two different ways, the first as such:

group2<- factor(c("N","A","N","A","N","A","N","A","N","A",
                 "C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N",
                 "C","L","N","C","L","N","C","L","N","C","C","C",        "N","L","N","C","L","N","C","L","N","C","L","N","C","L","N","L","N","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","N","C","L", 
  "N","A","N","A","N","A","N","A","N","A","N","A","N","A","N","A"),levels = c("N","A","C","L"))

lane2 <- factor(c(1,1,1,1,1,1,1,1,1,1,
                 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
                 3,3,3,3,3,3,3,3,3,3,3,3,
                 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
                 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5))

design <- model.matrix(~0+group2+lane2) 

cont.matrix <- makeContrasts(NvsA=N-A, 
                             NvsC=N-C,
                             NvsL=N-L,
                             AvsC=A-C,
                             AvsL=A-L,
                             CvsL=C-L,
                             levels=colnames(design))

The contrast matrix is then

Contrasts
Levels NvsA NvsC NvsL AvsC AvsL CvsL
    N     1    1    1    0    0    0
    A    -1    0    0    1    1    0
    C     0   -1    0   -1    0    1
    L     0    0   -1    0   -1   -1
    L2    0    0    0    0    0    0
    L3    0    0    0    0    0    0
    L4    0    0    0    0    0    0
    L5    0    0    0    0    0    0

and the results include DEGs for NvsA NvsC NvsL AvsC AvsL CvsL. If I understand correctly, this does not account for batch effect and is similar to what the rnaseq-123 article from the link above did?

The other option is similar to the answer to this question https://support.bioconductor.org/p/69374/. I used an interaction type design and averaged the cases from which batch each sample came from and only included the interactions that had a sample inside (otherwise linear model did not work as coefficients were not calculable):

designy1 <- model.matrix(~0+group2:lane2)
colnames(designy1)
colnames(designy1) <- c("N1","A1","C1","L1","N2","A2","C2","L2","N3","A3","C3","L3","N4","A4","C4","L4","N5","A5","C5","L5")
designy2 <- designy1[,c(1:2,5,7,9,11:13,15:18)]

The contrast matrix then looked like this

contr.matrixy <- makeContrasts(NC= (N1+N2+N3+N4+N5)/5-(C2+C3+C4)/3, 
                               nVSa= (N1+N2+N3+N4+N5)/5-(A1+A5)/2, 
                               AVSC= (A1+A5)/2-(C2+C3+C4)/3,
                               NVL= (N1+N2+N3+N4+N5)/5-(L3+L4)/2,
                               CVL= (C2+C3+C4)/3-(L3+L4)/2,
                               AVL=(A1+A5)/2-(L3+L4)/2,
                               levels=colnames(designy2))

I then as standard used voom normalisation, linear model fitting, etc to get to DEGs.

vy <- voom(countsbindlist4I,design = designy2, plot=TRUE)
fity <- lmFit(vy, designy2)
fit2y <- contrasts.fit(fity, contr.matrixy)
efity <- eBayes(fit2y)
dt<- decideTests(efity)
summary(dt)

Is any of the solutions I used okay, or did I miss the mark with the second one? Thank you all for your help and insights.

Best regards,

ADD COMMENTlink modified 4 days ago by swbarnes29.4k • written 4 days ago by kris2750
1
gravatar for swbarnes2
4 days ago by
swbarnes29.4k
United States
swbarnes29.4k wrote:

Illumina lanes generally do not add technical variance, so you shouldn't include them in the design.

ADD COMMENTlink written 4 days ago by swbarnes29.4k

Dear swbarnes2,

thank you for your reply. In this case the experiment was done in different batches, the batches were included as "lanes". I could have clarified this better with another variable name, but the batch to batch variability is definitely there- visible with dendrograms and mds.

All help still wanted :)

Thank you all.

ADD REPLYlink written 3 days ago by kris2750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2178 users visited in the last hour
_