Hi everyone,
I'm performing DE analysis using edgeR and I have a question regarding the correct use of EstimateDisp(). Indeed, I'm currently making comparisons between different conditions as follows :
Sample condition
A1 Milieu1
A2 Milieu1
B1 Milieu2
B2 Milieu2
B3 Milieu2
C1 Milieu3
C2 Milieu3
D1 Milieu4
D2 Milieu4
E1 Milieu5
E2 Milieu5
For instance, I want to compare Milieu1 with Milieu2 and Milieu3, and Milieu4 with Milieu5 in two separate analysis because it is two unrelated experiments. If I run my DE script :
design <- model.matrix(~0+condition)
dge <- DGEList(counts=counts,group= condition)
dge <- calcNormFactors(dge)
dge <- estimateDisp(dge, design = design)
fit <- glmQLFit(dge, design = design)
my.contrasts <- makeContrasts(1v2=conditionMilieu1-conditionMilieu2,1v3=conditionMilieu1-conditionMilieu3,4v5=conditionMilieu4-conditionMilieu5,levels=design)
qlf <- glmQLFTest(fit,contrast=my.contrasts[,"1v2"])
tt <- topTags(qlf, n = Inf)
But if I subset my count matrix before running my script and I separate Milieu1, Milieu2, Milieu3 on one hand and Milieu4 and Milieu5 on the other hand, I get slightly different results.
What would be the best way to proceed in this case? Should I subset my count matrix before estimating dispersion or proceed with the wholde dataset?
Thank you for enlightening me on this subject.
That's very clear thanks