Limma for batch effect correction
2
0
Entering edit mode
7 months ago
esauras • 0

Hi everyone,

I am analyzing data from microarrays with batch effect. To correct batch effect I am including this variable in the formula for the linear model:

formula <- paste("~0 ", "Group ", "Batch", sep = "+ ")


The group variable includes two categories: M_GO and NM_GO, whereas the batch variable includes three categories: Batch1, Batch2 and Batch3.

After that, I have created the design matrix, where the name of the columns are the following:

M_GO NM_GO Batch1 Batch2


Both categories for group appear because there is not an intercept. But, regarding the batch variable, only two of three categories are present. Samples with Batch1=1 & Batch2=0 will be Batch1, samples with Batch1=0 & Batch2=1 will be Batch2, and samples with Batch1=0 & Batch2=0 will be Batch3.

However, my question is regarding the contrastsmatrix. As Batch3 does not appear in the design matrix, the only comparison that can be perfomed for batch is "Batcheffect = Batch1 - Batch2".

contrasts <- c("GO_MvsNM = M_GO - NM_GO", "Batcheffect = Batch1 - Batch2")


Is this contrasts matrix taking into account the 3 categories for batch effect? Am I doing the analysis to correct batch effect properly?

Thank you very much in advance!!

microarray batch-effect limma • 1.3k views
0
Entering edit mode

I thought I knew the answer to this but did some digging only to find out I do not...but I have a few comments. First, this question is not about batch effect at all, but rather why your design matrix has one less column than the product of the number of groups and batches in your experiment (=2*3). Try troubleshooting this more general question--i suspect that the fact that this appears to relate most directly to batch effect was a bit of a red herring in your problem solving. Check out https://stackoverflow.com/a/17284028 and the comments here https://stats.stackexchange.com/q/174976

Realistically, why are you trying to find genes DE with respect to batches? Your code for making contrasts between groups looks perfect. If you really wish to compare batches 1 vs. 2, 1 vs. 3, and 2 vs. 3, simply change the order of your variables in model.matrix(), or for you, formula:

formula <- paste("~0 ", "Batch ", "Group", sep = "+ ")


And you will find columns for all 3 batches in the model matrix.

0
Entering edit mode
7 months ago
Gordon Smyth ★ 6.4k

If you show the code you used to created the design matrix, then someone will likely be able to help you.

4 days later

Thanks for providing code for the design matrix. I see now that your code is ok but you were confused about the use of intercepts in linear models, as suggested by bkleiboeker. Your code is more complicated than it needs to be. All you actually need is:

Group <- c("M_GO","M_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","M_GO","M_GO","M_GO",
"NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO",
"M_GO","M_GO","M_GO","NM_GO","NM_GO","NM_GO","NM_GO","M_GO","M_GO")
Batch <- c(3,3,1,1,1,1,1,1,3,2,2,1,1,3,3,2,2,1,1,1,1,3,2,2,2,2,2,2,2,2)
Group <- factor(Group)
Batch <- factor(Batch)
design <- model.matrix(~Group+Batch)


and later on in the analysis there is no need to form contrasts. This simple batch correction situation is covered (briefly) by Section 9.4.2 of the limma User's Guide.

0
Entering edit mode

Dear Prof. Gordon Smyth, About batch effects, adding batch info to the design may just acceptable for lmFit. If I plan to use lmscFit, it didnt allow me to add the info of batch. How can I solve this problem? Thank you. Best wishes, Di

0
Entering edit mode

lmscFit() does allow batch effects, but you must expand the batch factor to the single-channel level, for example by using targetsA2C().

0
Entering edit mode
7 months ago
esauras • 0

Thank you very much for your responses bkleiboeker and Gordon Smyth!

I read the comments on the links above and I found them very clarifying. The concept of the intercept is explained in detail and I think part of my problem was solved.

I add here my code for the design matrix:

df <- data.frame(cbind(c("M_GO","M_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","M_GO","M_GO","M_GO",
"NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO","NM_GO",
"M_GO","M_GO","M_GO","NM_GO","NM_GO","NM_GO","NM_GO","M_GO","M_GO")),c(3,3,1,1,1,1,1,
1,3,2,2,1,1,3,
3,2,2,1,1,1,1,
3,2,2,2,2,2,2,
2,2))
rownames(df) <- c(1:30)
colnames(df) <- c("Group","Batch")

df$Group <- factor(df$Group,levels=unique(df[,1]))
df$Batch <- factor(df$Batch,levels=unique(df[,2]))

formula <- paste("~0 ", "Group ", "Batch", sep = "+ ")
designmatrix <- model.matrix(as.formula(fmla), data = df)


If I am not wrong, creating this design matrix I am taking into account the batch effect. But my main objective is to look for the differentially expressed genes between groups. Therefore, in the contrast matrix I should only specify the comparison regarding the group and not the batch effect (code below). Is that correct?

contrasts <- c("GO_MvsNM = GroupM_GO - GroupNM_GO")
contrastsMatrix <- makeContrasts(contrasts = contrasts, levels = designmatrix)
print(contrastsMatrix)

0
Entering edit mode

Yes, you are correct--your contrast will give you genes DE between groups while "correcting for" batch effect. Your code looks fine, though in your model.matrix() function you pass in a variable called fmla when I think you mean formula. You might wanna make sure you don't have an old formula variable named fmla saved in your environment.