Question: Edger Logcpm Calculation
1
gravatar for koen.vdberge
7.1 years ago by
koen.vdberge30
Belgium
koen.vdberge30 wrote:

Dear all,

I am currently researching the differences in RNA-Seq data analysis, comparing the two well known EdgeR and Voom methods. However, there is one thing I can not manage to reproduce, namely the logCPM value in the output of the LRT table of EdgeR, after analyzing a certain contrast or coefficient. I understand from the manual and helpfile, that this logCPM value is a log2 counts per million, normalized for library sizes. I also understand that it is not a simple mean that is being used, but rather the mglmOneGroup() function. However, when I try to recalculate this myself, I can not obtain the same value. My workflow in doing so looks like this:

#Calculate through the table
counts <- read.delim("file.txt")
Treat <- factor(rep(c(rep(c("Cont","DPN","OHT"),4)),each=2))[-19] #Delete removed sample 19 by authors

y.s <- DGEList(counts=counts.filtered.smart,group=Treat)
y.s <- calcNormFactors(y.s)
y.common <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y.trended <- estimateGLMTrendedDisp(y.common, design)
y.tagwise <- estimateGLMTagwiseDisp(y.trended, design,prior.df=22)
lrt <- glmLRT(fit,coef= c(5,6,9,8))
lrt$table$logCPM

#Calculate manually
cpmstest <- cpm(y, normalized.lib.sizes = TRUE)
LogCpmstest <- log(cpmstest + 0.5, base = 2)
mglmOneGroup(LogCpmstest) #not identical

mglmOneGroup(y.tagwise) #also non-identical

Has anyone got any recommendations in what should be changed in this manual calculation workflow? What am I doing wrong? Any help would be greatly appreciated.

Sincerely, Koen.

R edger • 9.5k views
ADD COMMENTlink modified 23 months ago by Gordon Smyth2.3k • written 7.1 years ago by koen.vdberge30
1
gravatar for Charles Warden
6.8 years ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

Yeah, I've noticed that the log ratio values in the result file can be quite different (including sign changes that would go beyond changes due to rounding values, using RPKM instead of CPM, etc). In general, I've noticed that edgeR can give funky results sometimes, and I suspect it is doing some over-fitting at some step. This is why I would prefer DESeq over edgeR (but I've also heard that limma-zoom is also a good option).

If it helps, here are some benchmarks comparing some popular differential expression algorithms:

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

ADD COMMENTlink modified 14 months ago by Ram32k • written 6.8 years ago by Charles Warden8.0k
1
gravatar for Gordon Smyth
23 months ago by
Gordon Smyth2.3k
Australia
Gordon Smyth2.3k wrote:

https://hypatia.math.ethz.ch/pipermail/bioconductor/2014-February/057449.html

ADD COMMENTlink written 23 months ago by Gordon Smyth2.3k

Thank you for adding this response to the user's question!

ADD REPLYlink written 23 months ago by Charles Warden8.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour
_