Hi there, I'm new to working with EdgeR's generalized linear models, but I have a data set where I'm trying to get complex comparisons between groups, times, and treatments, so here I am...
I've figured out that the best approach for me is to use the cell means model. Right now I have three time points, two groups and two treatments at each time point.
Here is my code:
#countsDGE_d3 is my count matrix >key_3d sample group condition F.3dayInjury.IP1 F.3dayInjury forelimb injury F.3dayInjury.IP2 F.3dayInjury forelimb injury F.3dayInjury.IP3 F.3dayInjury forelimb injury F.Intact.IP1 F.Intact forelimb no_lesion F.Intact.IP2 F.Intact forelimb no_lesion F.Intact.IP3 F.Intact forelimb no_lesion H.3dayInjury.IP1 H.3dayInjury hindlimb injury H.3dayInjury.IP2 H.3dayInjury hindlimb injury H.3dayInjury.IP3 H.3dayInjury hindlimb injury H.Intact.IP1 H.Intact hindlimb no_lesion H.Intact.IP2 H.Intact hindlimb no_lesion H.Intact.IP3 H.Intact hindlimb no_lesion H.Intact.IP4 H.Intact hindlimb no_lesion design3d <- model.matrix(~0+key_3d$sample) colnames(design3d) <- levels(key_3d$sample) countsDGE_d3 <- calcNormFactors(countsDGE_d3) countsDGE_d3 <- estimateDisp(countsDGE_d3, design3d, robust = TRUE) countsDGE_d3$common.dispersion fit <- glmQLFit(countsDGE_d3, design3d) cont.matrix <- makeContrasts(NvsINJinFL=F.3dayInjury-F.Intact, NvsINJinHL=H.3dayInjury-H.Intact, Diff=(F.3dayInjury-F.Intact)-(H.3dayInjury-H.Intact), levels=design3d) qlf <- glmQLFTest(fit, contrast=cont.matrix[,"NvsINJinFL"]) topTags(qlf, n = 20) table <- topTags(qlf, n = Inf) write.csv( as.data.frame(table), file="All_FLDay3_InjuredvsIntact_results.csv")
I then repeat running glmQLFTest on all three contrasts and export the topTags table.
My question is, due to the data being pretty noisy, I used the RUVseq package to calculate the factors of unwanted variation for my samples. I would like to incorporate these into my design matrix and into my contrasts. How do I construct this model matrix and contrast matrix?
Does this make sense? But then what should my contrasts be?
design3d <- model.matrix(~0+key_3d$sample+W_1, data=pData_3d)
Where pData_3d is a matrix of the factors of unwanted variation from:
seq_set <- newSeqExpressionSet(as.matrix(dge_counts, phenoData = key$sample)) design <- model.matrix(~0 + key$sample) colnames(design) <- levels(key$sample) dge <- DGEList(counts = dge_counts, group = key$sample) dge <- calcNormFactors(dge, method="upperquartile", rubust = TRUE) dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) fit <- glmFit(dge, design) res <- residuals(fit, type = "deviance") #use all the genes to estimate the factors of unwanted variation. seqUQ <- betweenLaneNormalization(seq_set, which="upper") controls <- rownames(seq_set) seqRUVr <- RUVr(seqUQ, controls, k=1, res)