In my RNAseq data, I have 6 groups (A, B, C, D, E, F). I'd like to compare between A and D. I did DE analysis with limma. I found including all groups in the model and including only the 2 comparison groups gave me significantly different numbers of DEGs (P < 0.05). For example, the first method below detected 6000 DEGs while the 2nd method only detected 900 DEGs. Why is the huge difference between the two methods and which one should I use?
Here are 2 ways to do the DE analysis:
1. Include all the groups and comparisons in the design matrix and contrast matrix:
design <- model.matrix(~ 0 + group, data) contrast <- makeContrasts(A.vs.B = A-B, A.vs.C = A-C, A.vs.D = A-D, A.vs.E = A-E, A.vs.F = A-F, B.vs.C = B-C, B.vs.D = B-D, B.vs.E = B-E, B.vs.F = B-F, C.vs.D = C-D, C.vs.E = C-E, C.vs.F = C-F, D.vs.E = D-E, D.vs.F = D-F, E.vs.F = E-F) fit1 <- lmFit(object = data, design = design) fit2 <- contrasts.fit(fit1, contrast) fit2 <- eBayes(fit2) topTable(fit2, coef = A.vs.D)
2. Subset group A and D, and focus only on these 2 groups:
design <- model.matrix(~ 0 + group_sub, data_sub) contrast <- makeContrasts(A.vs.D = A-D)
I wrote some code specifically to compare the above 2 methods. Here are the code and data.