I'm performing DE analysis using edgeR and I have a question regarding the correct use of EstimateDisp(). Indeed, I'm currently making comparisons between different conditions as follows :
Sample condition A1 Milieu1 A2 Milieu1 B1 Milieu2 B2 Milieu2 B3 Milieu2 C1 Milieu3 C2 Milieu3 D1 Milieu4 D2 Milieu4 E1 Milieu5 E2 Milieu5
For instance, I want to compare Milieu1 with Milieu2 and Milieu3, and Milieu4 with Milieu5 in two separate analysis because it is two unrelated experiments. If I run my DE script :
design <- model.matrix(~0+condition) dge <- DGEList(counts=counts,group= condition) dge <- calcNormFactors(dge) dge <- estimateDisp(dge, design = design) fit <- glmQLFit(dge, design = design) my.contrasts <- makeContrasts(1v2=conditionMilieu1-conditionMilieu2,1v3=conditionMilieu1-conditionMilieu3,4v5=conditionMilieu4-conditionMilieu5,levels=design) qlf <- glmQLFTest(fit,contrast=my.contrasts[,"1v2"]) tt <- topTags(qlf, n = Inf)
But if I subset my count matrix before running my script and I separate Milieu1, Milieu2, Milieu3 on one hand and Milieu4 and Milieu5 on the other hand, I get slightly different results.
What would be the best way to proceed in this case? Should I subset my count matrix before estimating dispersion or proceed with the wholde dataset?
Thank you for enlightening me on this subject.