Question

Should I subset my data before running EstimateDisp() in RNA-seq analysis ?

0

Entering edit mode

4.1 years ago

Basti ★ 2.1k

Hi everyone,

I'm performing DE analysis using edgeR and I have a question regarding the correct use of EstimateDisp(). Indeed, I'm currently making comparisons between different conditions as follows :

Sample      condition
A1          Milieu1
A2          Milieu1
B1          Milieu2
B2          Milieu2
B3          Milieu2
C1          Milieu3
C2          Milieu3
D1          Milieu4
D2          Milieu4
E1          Milieu5
E2          Milieu5

For instance, I want to compare Milieu1 with Milieu2 and Milieu3, and Milieu4 with Milieu5 in two separate analysis because it is two unrelated experiments. If I run my DE script :

design <- model.matrix(~0+condition)
dge <- DGEList(counts=counts,group= condition)
dge <- calcNormFactors(dge)
dge <- estimateDisp(dge, design = design)
fit <- glmQLFit(dge, design = design)
my.contrasts <- makeContrasts(1v2=conditionMilieu1-conditionMilieu2,1v3=conditionMilieu1-conditionMilieu3,4v5=conditionMilieu4-conditionMilieu5,levels=design)
qlf <- glmQLFTest(fit,contrast=my.contrasts[,"1v2"])
tt <- topTags(qlf, n = Inf)

But if I subset my count matrix before running my script and I separate Milieu1, Milieu2, Milieu3 on one hand and Milieu4 and Milieu5 on the other hand, I get slightly different results.

What would be the best way to proceed in this case? Should I subset my count matrix before estimating dispersion or proceed with the wholde dataset?

Thank you for enlightening me on this subject.

RNAseq edgeR EstimateDisp • 1.5k views

ADD COMMENT • link 4.1 years ago by Basti ★ 2.1k

score 2 · Accepted Answer · 2021-06-22

2

Entering edit mode

4.1 years ago

ATpoint 88k

If this is indeed two separate experiments then I would subset at the very beginning, and create two different DGEList objects, followed by running FilterByExpr as instructed in the edgeR manual. The reason is that normalization and parameter estimation is influenced by all samples, so if 4/5 are a different experiment I find it hard to justify why they should be together in the same edgeR analysis as 1/2/3.