Design and contrast matrices for including batch effect
1
0
Entering edit mode
3.3 years ago
kris275 ▴ 10

Hello everyone,

I am working on a RNA-seq project that includes samples from normal tissue, adenomas, adenocarcinomas, liver metastases and was done in five batches. As I have previously read, it is best if batch effect is included in the design matrix for linear model fitting- shown in limma guide and (http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html).

However, in the latter article, when they write contrasts, the lanes are not included, not accounted for if i understand correctly. A similar story is shown in limma guide chapter 17.1.5, where the DEGs are for the final coefficient.

I performed the analysis in two different ways, the first as such:

group2<- factor(c("N","A","N","A","N","A","N","A","N","A",
                 "C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N","C","N",
                 "C","L","N","C","L","N","C","L","N","C","C","C",        "N","L","N","C","L","N","C","L","N","C","L","N","C","L","N","L","N","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","L","N","C","L","N","C","L","N","L","N","C","N","C","L", 
  "N","A","N","A","N","A","N","A","N","A","N","A","N","A","N","A"),levels = c("N","A","C","L"))

lane2 <- factor(c(1,1,1,1,1,1,1,1,1,1,
                 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
                 3,3,3,3,3,3,3,3,3,3,3,3,
                 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
                 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5))

design <- model.matrix(~0+group2+lane2) 

cont.matrix <- makeContrasts(NvsA=N-A, 
                             NvsC=N-C,
                             NvsL=N-L,
                             AvsC=A-C,
                             AvsL=A-L,
                             CvsL=C-L,
                             levels=colnames(design))

The contrast matrix is then

Contrasts
Levels NvsA NvsC NvsL AvsC AvsL CvsL
    N     1    1    1    0    0    0
    A    -1    0    0    1    1    0
    C     0   -1    0   -1    0    1
    L     0    0   -1    0   -1   -1
    L2    0    0    0    0    0    0
    L3    0    0    0    0    0    0
    L4    0    0    0    0    0    0
    L5    0    0    0    0    0    0

and the results include DEGs for NvsA NvsC NvsL AvsC AvsL CvsL. If I understand correctly, this does not account for batch effect and is similar to what the rnaseq-123 article from the link above did?

The other option is similar to the answer to this question https://support.bioconductor.org/p/69374/. I used an interaction type design and averaged the cases from which batch each sample came from and only included the interactions that had a sample inside (otherwise linear model did not work as coefficients were not calculable):

designy1 <- model.matrix(~0+group2:lane2)
colnames(designy1)
colnames(designy1) <- c("N1","A1","C1","L1","N2","A2","C2","L2","N3","A3","C3","L3","N4","A4","C4","L4","N5","A5","C5","L5")
designy2 <- designy1[,c(1:2,5,7,9,11:13,15:18)]

The contrast matrix then looked like this

contr.matrixy <- makeContrasts(NC= (N1+N2+N3+N4+N5)/5-(C2+C3+C4)/3, 
                               nVSa= (N1+N2+N3+N4+N5)/5-(A1+A5)/2, 
                               AVSC= (A1+A5)/2-(C2+C3+C4)/3,
                               NVL= (N1+N2+N3+N4+N5)/5-(L3+L4)/2,
                               CVL= (C2+C3+C4)/3-(L3+L4)/2,
                               AVL=(A1+A5)/2-(L3+L4)/2,
                               levels=colnames(designy2))

I then as standard used voom normalisation, linear model fitting, etc to get to DEGs.

vy <- voom(countsbindlist4I,design = designy2, plot=TRUE)
fity <- lmFit(vy, designy2)
fit2y <- contrasts.fit(fity, contr.matrixy)
efity <- eBayes(fit2y)
dt<- decideTests(efity)
summary(dt)

Is any of the solutions I used okay, or did I miss the mark with the second one? Thank you all for your help and insights.

Best regards

R contrasts RNA-Seq design batch-effect • 1.1k views
ADD COMMENT
1
Entering edit mode
3.3 years ago

Illumina lanes generally do not add technical variance, so you shouldn't include them in the design.

ADD COMMENT
0
Entering edit mode

Dear swbarnes2,

thank you for your reply. In this case the experiment was done in different batches, the batches were included as "lanes". I could have clarified this better with another variable name, but the batch to batch variability is definitely there- visible with dendrograms and mds.

All help still wanted :)

Thank you all.

ADD REPLY

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6