Normalisation before edgeR for RNA-Seq
27 days ago
ZheFrench ▴ 330

I am more a DESeq2 user and switch to edgeR recently. I received scripts from other dev. With DESeq I used to directly inject raw counts...Here the guy pre-normalise count using, is that ok ?

Is this double normalization because I think edgeR intrinsically normalize reads, right ? So I was wondering if I should remove this part of code before edgeR call. What do you think ?

Roughly :

  ###### Useless section ? ######
q <- apply(counts,2,function(x) quantile(x[x>0],prob=0.75))
ncounts <- sweep(counts,2,q/median(q),"/")
################################# Should I just use counts ?

dge <- DGEList(ncounts,genes=rownames(ncounts))
design <- model.matrix(~0+dge$group) # no intercept #x0 = 1, force model throught the origin colnames(design) <- gsub("^dge$group","",colnames(design))
cm <- makeContrasts(contrasts=comp,levels=dge\$group)

y     <- estimateDisp(dge,design,robust=T)
fit.y <- glmFit(y,design)
lrt   <- glmLRT(fit.y,contrast=cm)

27 days ago
ATpoint 49k

The manual instructs to use the raw counts. https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Normalization only happens if you use calcNormFactors, otherwise a plain per-million scaling is performed which does not correct for library composition. I would strictly stick to the manual if in doubt. This custom code on top from your colleague should probably be ignored. There is a "quick start" section in the manual you can use for a simple analysis, be sure to use calcNormFactors and do not use prenormalized counts.