I am trying perform differential analysis with edgeR on sequencing data across two conditions, each of which has two replicates. I first try the following scripts learned from edgeR.
d0 <- DGEList(count_mat)
d <- calcNormFactors(d0)
mm <- model.matrix(~0 + group)
y <- voom(d, mm, plot = TRUE)
fit <- lmFit(y, mm)
contr <- makeContrasts(grouptreated - groupuntreated, levels = colnames(coef(fit)))
tmp <- contrasts.fit(fit, contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)
This works well but I need to customize normalization methods because we are not working on RNA-seq data but other sequencing data and we intend to explore different normalization methods. So, I am wondering if I can prepare y by myselft but not with voom(), and using my customized y as the input of lmFit().
I make the following attempts: if lmFit just uses the normalized reads from d0 which I think should be y$E, then it should work if I perform fit_new <- lmFit(y$E, mm), and fit should be the same as fit_new. However, they are not equivalent. So, here are my questions:
- If I have a customized normalized reads matrix -- each row is a gene, and each column is a replicate under a condition -- how I can wrap it up as a
lmFitaccepted object. - Why
fit_new <- lmFit(y$E, mm)andfit <- lmFit(y, mm)are not equivalent?
Besides, I am also wondering if I use the default settings of voom() as the script I showed above, what exactly normalization voom() has performed? At the beginning, I believed that it is log2 CPM with prior count=0.5. However, I found that y$E (from y <- voom(d, mm, plot = TRUE) is different from cpm_d <- cpm(d, log=TRUE, prior.count=0.5) (both d here are from d <- calcNormFactors(d0)), so I am wondering if it implies that the default settings of voom() is different from log2 CPM with prior count=0.5.
Finally, could anyone reveal the math equations of the normalization methods of the default settings of voom() and cpm(d, log=TRUE, prior.count=0.5) so that I can understand more about it. I checked the source code of cpm() but it just show out <- .Call(.cxx_calculate_cpm_log, y, lib.size, prior.count) and I cannot find .cxx_calculate_cpm_log.
Thank you very much!