I am very new to RNAseq analysis and I am learning how to use edgeR to perform differential gene expression. I am quite confused about how contrasts work. According to the manual:
design <- model.matrix(~0+group, data=y$samples) colnames(design) <- levels(y$samples$group) design A B C sample.1 1 0 0 sample.2 1 0 0 sample.3 0 1 0 sample.4 0 1 0 sample.5 0 0 1
One can compare any of the treatment groups using the contrast argument of the glmQLFTest or glmLRT function. For example,
fit <- glmQLFit(y, design) qlf <- glmQLFTest(fit, contrast=c(-1,1,0)) topTags(qlf)
will compare B to A. The meaning of the contrast is to make the comparison -1A + 1B + 0*C, which is of course is simply B-A.
I have to admit I don't understand this. In other words, I don't understand the "maths" behind it: is B - A the same as A -B? Does the order of the columns matter in the design? And I'm guessing you shouldn't have an intercept in your design matrix if you need to make contrasts?
I've been trying to practise on some real data given to me and apply this:
I have 9 samples, split into 3 groups: wildtype (WT), knockout1 (KOA), knockout2 (KOB). The aim (given to me) was "we need to compare each KO with WT". Therefore, I made the following design matrix:
# Quick overview of my DGEList $counts
head (y$counts)
# KOA1 KOA2 KOA3 KOB1 KOB2 KOB3 WT1 WT2 WT3
# gene1 12 24 45 34 45 26 16 20 24
# gene2 8007 14031 13076 12305 14154 8805 15194 18854 12634
# gene3 12 43 37 104 44 44 28 59 18
# gene4 553 759 955 736 631 517 829 894 997
# gene5 1232 2139 2417 2402 2148 1652 1899 2409 2959
# gene6 643 766 826 239 623 532 314 318 787
# Make model matrix
Geno <- factor(substring(colnames(y),1,nchar(colnames(y))-1))
design <- model.matrix(~0+Geno)
design
GenoKOA GenoKOB GenoWT
1 1 0 0
2 1 0 0
3 1 0 0
4 0 1 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
9 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Geno
[1] "contr.treatment"
# Estimate dispersion and fit glm
y <- estimateDisp(y, design, robust = TRUE)
fit <- glmQLFit(y, design, robust = TRUE)
# Make contrast and test
my.contrasts <- makeContrasts(WTvsKOA=GenoKOA-GenoWT, WTvsKOB=GenoKOB-GenoWT, levels = design)
qlf.WTvsKOA <- glmQLFTest(fit, contrast = my.contrasts[,"WTvsKOA"])
qlf.WTvsKOB <- glmQLFTest(fit, contrast = my.contrasts[,"WTvsKOB"])
Both my tests find 0 over-expressed and under-expressed and I am unsure as to why that is. It may be unrelated to my commands (replicates from KOA do not cluster together on my plotMDS graph). But either way, my questions remain: am I making the contrasts correctly?
Any help would be deeply appreciated.