Problems in differential expression analysis with edgeR
0
1
Entering edit mode
6 months ago
thancy ▴ 10

I used glmQLF for differential expression analysis, and the result is almost all-down or all-up. Then I try glmQLF analysis with different data and glmLRT with the same data, but get the same result. Here is my code:

group <- factor(c(1,2,1,2))
y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group)
isexpr <- rowSums(cpm(y) > 1) >=2
y <- y[isexpr,keep.lib.sizes=FALSE]
design <- model.matrix(~0+group)
y <- estimateDisp(y,design)
fit <- glmQLFit(y, design, robust=TRUE)
qlf = glmQLFTest(fit)
summary(decideTests(qlf))


result:

       group2
Down    12572
NotSig     13
Up          0


I think it may be something wrong with the design matrix because when I used exactTest which do not need a design for input, the problem seems to be solved. But I cant not find anything wrong with the design matrix. exactTest code:

group = factor(c(1,2,1,2))
y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group)
isexpr =rowSums(cpm(y) > 1) >=2
y = y[isexpr,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
y <- estimateDisp(y)
et <- exactTest(y)
summary(decideTests(et))


exactTest result:

         1+2
Down      31
NotSig 12512
Up        42


design:

> design
group1 group2
1      1      0
2      0      1
3      1      0
4      0      1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$group [1] "contr.treatment"  R RNA-Seq edgeR • 355 views ADD COMMENT 0 Entering edit mode You need to also run calcNormFactors if you are using the glm method. ADD REPLY 0 Entering edit mode Thanks for your suggestion. I have run calcNormFactors and it didnt work. ADD REPLY 0 Entering edit mode What did not work? Indeed lack of proper normalization could explain these 12000 "down" genes. You should make an MA-plot to explore results, see edgeR::maPlot(). Please run: group <- factor(c(1,2,1,2)) y <- DGEList(counts = counts_data,genes = rownames(counts_data),group = group) isexpr <- filterByExpr(y, group=group) y <- y[isexpr, ,keep.lib.sizes=FALSE] y <- calcNormFactors(y) design <- model.matrix(~group) y <- estimateDisp(y,design) fit <- glmQLFit(y, design, robust=TRUE) qlf = glmQLFTest(fit) tt <- topTags(qlf, n=Inf)$table

maPlot(tt$logCPM, tt$logFC)


Please show the plot and show how many DEGs you get with this code. You should also perform PCA to see how replicates cluster, see Basic normalization, batch correction and visualization of RNA-seq data

0
Entering edit mode

The code is as same as you provided except making MA_plot code was changed to as follows:

plotMA(cbind(tt$logCPM,tt$logFC),array = 1)


DEGs result:

       group2
Down        2
NotSig  12376
Up          5

0
Entering edit mode

Plot makes no sense, which is expected because you altered the code without checking what the function actually does. Anyway, if these DE results are what the pipeline I posted produces then it is what it is. If you want further help then please either post the maPlot() as I posted it (or as in the linked tutorial) and show PCA results, given you wish to explore whether these results make sense. It also makes sense to learn how these plots work and why these are a good diagnostic, explained in the linked tutorial so that you can judge for yourself rather then just following suggestions on the internet.