I'm using edgeR to test for DE genes in an expression matrix. For each sample if have a condition and a batch. I want to use the glm-functionality in edgeR to test for DE between conditions, while taking into account batches.
Some example data to show my problem. Say if have a count matrix, EM, of 10 sample with the following labels:
EM = EM # Imagine matrix of counts here... conditions = c("con1", "con1", "con1", "con2", "con2", "con2", "con3", "con3", "con3", "con3") batches = c("batch1", "batch2", "batch3", "batch1", "batch2", "batch3", "batch1", "con2", "con3", "con3")
The pipeline could look like this:
dge = DGEList(counts=EM) # Create object dge = calcNormFactors(dge, method='TMM') # Normalize library sizes using TMM design = model.matrix(~0+conditions+batches) # Create design matrix for glm colnames(design) = c(levels(conditions), levels(batches)[2:length(levels(batches))]) # Set prettier column names dge = estimateGLMCommonDisp(dge, design) # Estimate common dispersion dge = estimateGLMTagwiseDisp(dge, design) # Estimate tagwise dispersion fit = glmFit(dge,design) # Fit glm pair_vector = sprintf("%s-%s", "con1", "con3") # Samples to be compared pair_contrast = makeContrasts(contrasts=pair_vector, levels=design) # Make contrast lrt = glmLRT(fit, contrast=pair_contrast) # Likelihood ratio test
1) The design matrix: There is no baseline conditions, so I remove the intersect with the 0+. Is this necessary to do for batches as well, even though they are not directly used as contrasts?
2) Does the glmFit take into account norm-factors for library sizes?