Question

Which model to choose in edgeR analysis?

0

Entering edit mode

7.8 years ago

biolab ★ 1.4k

Dear all,

Our RNA-seq data has two samples, each of which has three biological replicates. I have two edgeR scripts as follows. I found that they produced different results. My question is: which model (classical linear or GLM) to choose in edgeR analysis?

I wish biostars experts on R analysis could provide me some comments or suggestions. Thank you very much .

- R script 1

d <- read.delim("input", sep = ' ', header=F, row.names=1)
colnames(d) <- c("w1", "w2", "w3", "m1", "m2", "m3")
g <- factor(c(1,1,1,2,2,2))
dge <- DGEList(counts=d,group=g)
dge <- calcNormFactors(dge)
design <- model.matrix(~g)
dge <- estimateDisp(dge, design)
fit <- glmQLFit(dge, design)
qlf <- glmQLFTest(fit, coef=2)
topTags(qlf, n=100000)

R script 2

d <- read.delim("input", sep = ' ', header=F, row.names=1)
colnames(d) <- c("w1", "w2", "w3", "m1", "m2", "m3")
g <- factor(c(1,1,1,2,2,2))
dge <- DGEList(counts=d,group=g)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge)
topTags(et, n=100000)

edgeR • 8.8k views

ADD COMMENT • link updated 7.8 years ago by Benn 8.3k • written 7.8 years ago by biolab ★ 1.4k

score 1 · Answer 1 · 2016-07-07

1

Entering edit mode

7.8 years ago

Benn 8.3k

This was once asked on the bioconductor support site:

https://support.bioconductor.org/p/64807/

ADD COMMENT • link 7.8 years ago by Benn 8.3k

1

Entering edit mode

And the difference between glmFit and glmQLFit is described here:

https://support.bioconductor.org/p/76790/

ADD REPLY • link 7.8 years ago by Benn 8.3k

0

Entering edit mode

Hi, b.nota,

Thank you very much for your reply.

I just performed a variety of tests, and found that if changing the code qlf <- glmQLFTest(fit, coef=2) in the first R script to qlf <- glmQLFTest(fit), then the two scripts produce similar results. Would you please indicate to me the difference of using coef=2 or not?

I really appreciate your help.

ADD REPLY • link 7.8 years ago by biolab ★ 1.4k

1

Entering edit mode

Hi,

The coef=2 argument means that your second group will be subtracted from the intercept of your design. Why you get similar results without, I don't know. If you want an explanation from the developers of this package you should ask this question on the bioconductor support site.

https://support.bioconductor.org/t/Latest/

ADD REPLY • link 7.8 years ago by Benn 8.3k

0

Entering edit mode

Hi, b.nota, Thank you very much for your reply. I missed your reply these days.

Maybe you misunderstood me. I mean with coef=2 and without coef=2 produce very different results. I still cannot figure out "the intercept of your design" in your comment. Does coef=2 mean group2 vs group1? What does qlf <- glmQLFTest(fit) mean if omitting coef=2?

I really appreciate your answer. Thanks!

ADD REPLY • link 7.8 years ago by biolab ★ 1.4k

1

Entering edit mode

You're question is a pure statistical question, concerning linear models and intercept, etc.

I can try to explain it to you, but I am a biologist with programming skills...

Like I suggested before, try the bioconductor support site. Since edgeR is a bioconductor package, support is given on that site. Usually one of the developers or maintainers can explain you how to use the package correctly, and they respond within a day.