Question

Problems in differential expression analysis with edgeR

1

Entering edit mode

3.6 years ago

thancy ▴ 10

I used glmQLF for differential expression analysis, and the result is almost all-down or all-up. Then I try glmQLF analysis with different data and glmLRT with the same data, but get the same result. Here is my code:

group <- factor(c(1,2,1,2))
y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group)
isexpr <- rowSums(cpm(y) > 1) >=2
y <- y[isexpr,keep.lib.sizes=FALSE]
design <- model.matrix(~0+group)
y <- estimateDisp(y,design)
fit <- glmQLFit(y, design, robust=TRUE)
qlf = glmQLFTest(fit)
summary(decideTests(qlf))

result:

       group2
Down    12572
NotSig     13
Up          0

I think it may be something wrong with the design matrix because when I used exactTest which do not need a design for input, the problem seems to be solved. But I cant not find anything wrong with the design matrix. exactTest code:

group = factor(c(1,2,1,2))
y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group)
isexpr =rowSums(cpm(y) > 1) >=2
y = y[isexpr,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
y <- estimateDisp(y)
et <- exactTest(y)
summary(decideTests(et))

exactTest result:

         1+2
Down      31
NotSig 12512
Up        42

design:

> design
  group1 group2
1      1      0
2      0      1
3      1      0
4      0      1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`group`
[1] "contr.treatment"

R RNA-Seq edgeR • 1.8k views

ADD COMMENT • link updated 2.5 years ago by Gordon Smyth ★ 7.0k • written 3.6 years ago by thancy ▴ 10

0

Entering edit mode

You need to also run calcNormFactors if you are using the glm method.

ADD REPLY • link 3.6 years ago by rpolicastro 13k

0

Entering edit mode

Thanks for your suggestion. I have run calcNormFactors and it didnt work.

ADD REPLY • link 3.6 years ago by thancy ▴ 10

0

Entering edit mode

What did not work? Indeed lack of proper normalization could explain these 12000 "down" genes. You should make an MA-plot to explore results, see edgeR::maPlot(). Please run:

group <- factor(c(1,2,1,2))

y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group)

isexpr <- filterByExpr(y, group=group)

y <- y[isexpr, ,keep.lib.sizes=FALSE]

y <- calcNormFactors(y)

design <- model.matrix(~group)

y <- estimateDisp(y,design)

fit <- glmQLFit(y, design, robust=TRUE)

qlf = glmQLFTest(fit)

tt <- topTags(qlf, n=Inf)$table

maPlot(tt$logCPM, tt$logFC)

Please show the plot and show how many DEGs you get with this code. You should also perform PCA to see how replicates cluster, see Basic normalization, batch correction and visualization of RNA-seq data

ADD REPLY • link 3.6 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your reply ATpoint. MA-plot: MA_plot

The code is as same as you provided except making MA_plot code was changed to as follows:

plotMA(cbind(tt$logCPM,tt$logFC),array = 1)

DEGs result:

       group2
Down        2
NotSig  12376
Up          5

ADD REPLY • link 3.6 years ago by thancy ▴ 10

0

Entering edit mode

Plot makes no sense, which is expected because you altered the code without checking what the function actually does. Anyway, if these DE results are what the pipeline I posted produces then it is what it is. If you want further help then please either post the maPlot() as I posted it (or as in the linked tutorial) and show PCA results, given you wish to explore whether these results make sense. It also makes sense to learn how these plots work and why these are a good diagnostic, explained in the linked tutorial so that you can judge for yourself rather then just following suggestions on the internet.

ADD REPLY • link 3.6 years ago by ATpoint 82k

score 0 · Answer 1 · 2021-10-16

You've made an error in specifying the comparison to be done. To compare group 2 vs group 1 you need either:

design <- model.matrix(~group)
qlf = glmQLFTest(fit)

or

design <- model.matrix(~0+group)
qlf = glmQLFTest(fit, contrast=c(-1,1))

You don't need to remove the intercept from the linear model with 0+ but, if you do, then you must specify a contrast. See the User's Guide Section 3.2. At the moment you are simply testing the group 2 baseline against nothing, which explains the crazy DE result,