Question: Too few differentially expressed genes identified by edgeR
0
gravatar for afli
4 weeks ago by
afli140
China, Beijing, IGDB
afli140 wrote:

Dear all,

I'm new in using edgeR. I get no diffrentially expressed genes and I don't know why, could anyone give me some advice? Thank you very much! I pasted the data link and my scripts as follows.

data link: https://de.cyverse.org/dl/d/5F987820-8F84-4772-8561-416202A11580/allh.counts_test.txt

scripts:

library(edgeR)
data<-read.table("allh.counts_test.txt",header=T)
data<-data[,3:6]
Treat <- factor(substring(colnames(data),1,4))
Time <- factor(substring(colnames(data),7,7))
y <- DGEList(counts=data, group=Treat)
keep <- rowSums(cpm(y)>2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~Time+Treat)
rownames(design) <- colnames(y)
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design, robust=TRUE)
qlf <- glmQLFTest(fit)
summary(decideTests(qlf))
#       Treatxieq
Down           0
NotSig     25016
Up             0

But when you look into the CPM value, it show much difference actually.

top <- rownames(topTags(qlf))
cpm(y)[top,]
                    rgou_r1    rgou_r2   xieq_r1     xieq_r2
LOC_Os11g10520.1 374.356938 229.981895 23.064987  14.8218599
LOC_Os11g10520.3 374.356938 229.981895 23.064987  14.8218599
LOC_Os08g04350.1   4.482846   2.699318 88.060054  50.4744420
LOC_Os11g10520.2 375.851220 230.791691 23.133838  15.5658143
LOC_Os11g11770.1   5.033371   5.992486 87.509248 151.5377805
LOC_Os09g13440.2  53.636855  16.141922  2.134372   0.5150453
LOC_Os06g29730.1 152.259460  80.709609 14.320947   8.2407252
LOC_Os09g13440.1  81.320394  26.669262  2.891730   0.5722726
LOC_Os12g18729.5 320.877375 376.338918 15.560260  11.0448609

In DESeq2 I successfully get expected differentially expressed gene numbers(nearly 1000). I just wonder why edgeR could not identify those genes with fdr value <0.05?

edger • 153 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by afli140
4

Did you check your design? You select 4 samples but use two factors so that you don’t have any replicated factor values. I think you should ignore factor Time.

ADD REPLYlink written 4 weeks ago by Michael Dondrup45k

The reason why I recommended to ignore one of the factors is that naming of the samples _r1 _r2 might translate to 'replicate', but this is guesswork.

ADD REPLYlink written 4 weeks ago by Michael Dondrup45k

Thank you for your suggestion, samples _r1 _r2 are two biological replicates. I changed the design but still get no differential genes...

library(edgeR)
data<-read.table("allh.counts_test.txt",header=T)
data<-data[,3:6]
Treat <- factor(substring(colnames(data),1,4))
y <- DGEList(counts=data, group=Treat)
keep <- rowSums(cpm(y)>2) >= 3
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~Treat)
rownames(design) <- colnames(y)
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design, robust=TRUE)
qlf <- glmQLFTest(fit)
summary(decideTests(qlf))
       Treatxieq
Down           0
NotSig     25016
Up             0
ADD REPLYlink written 4 weeks ago by afli140

Instead of sharing the whole dataset (which kinda conveys that you wish for others to do the analysis for you, which might not be what you intend), you could share an excerpt (50-200 lines) in a GitHub gist. See this post for the how-to on gist: A: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists

ADD REPLYlink written 4 weeks ago by RamRS19k

Thank you for your suggestion RamRS, I give the data link and paste the scripts so that everyone can download this data (small size), and run this in edgeR, totally need about 2 minutes. And I edit my post with more detail for those who do not have much time to run and check my data. Hope this is OK.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by afli140
4
gravatar for h.mon
4 weeks ago by
h.mon22k
Brazil
h.mon22k wrote:

You have no biological replicates: you have one sample to xieq time1, one sample to xieq time2, one sample to rgou time1, and one sample to rgou time 2. Thus your experiment is severely under-powered, and this is a likely reason for not finding any differentially expressed transcript. See section "2.9 What to do if you have no replicates" from the edgeR User Guide.

ADD COMMENTlink written 4 weeks ago by h.mon22k

Thank you for your contribution. r1 and r2 mean two biological replicates.

ADD REPLYlink written 4 weeks ago by afli140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1461 users visited in the last hour