Question: Edger Logcpm Calculation
1
gravatar for koen.vdberge
6.6 years ago by
koen.vdberge10
Belgium
koen.vdberge10 wrote:

Dear all,

I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:

#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19] #Delete removed sample 19 by authors

y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y.trended <- estimateGLMTrendedDisp(y.common, design)
y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22)
lrt <- glmLRT(fit,coef= c(5,6,9,8))
lrt$table$logCPM

#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE)
LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical

mglmOneGroup(y.tagwise) #also non-identical

Has anyone got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong? Any help would be greatly appreciated.

Sincerely, Koen.

R edger • 8.9k views
ADD COMMENTlink modified 18 months ago by Gordon Smyth1.9k • written 6.6 years ago by koen.vdberge10
1
gravatar for Charles Warden
6.4 years ago by
Charles Warden7.8k
Duarte, CA
Charles Warden7.8k wrote:

Yeah, I've noticed that the log ratio values in the result file can be quite different (including sign changes that would go beyond changes due to rounding values, using RPKM instead of CPM, etc). In general, I've noticed that edgeR can give funky results sometimes, and I suspect it is doing some over-fitting at some step. This is why I would prefer DESeq over edgeR (but I've also heard that limma-zoom is also a good option).

If it helps, here are some benchmarks comparing some popular differential expression algorithms:

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

ADD COMMENTlink modified 8 months ago by RamRS30k • written 6.4 years ago by Charles Warden7.8k
1
gravatar for Gordon Smyth
18 months ago by
Gordon Smyth1.9k
Australia
Gordon Smyth1.9k wrote:

https://hypatia.math.ethz.ch/pipermail/bioconductor/2014-February/057449.html

ADD COMMENTlink written 18 months ago by Gordon Smyth1.9k

Thank you for adding this response to the user's question!

ADD REPLYlink written 18 months ago by Charles Warden7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 745 users visited in the last hour