Question

edgeR: how to make contrasts with makeContrasts

3

Entering edit mode

6.7 years ago

m98 ▴ 440

I am very new to RNAseq analysis and I am learning how to use edgeR to perform differential gene expression. I am quite confused about how contrasts work. According to the manual:

design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)
design
         A B C
sample.1 1 0 0 
sample.2 1 0 0 
sample.3 0 1 0 
sample.4 0 1 0 
sample.5 0 0 1
  
One can compare any of the treatment groups using the contrast argument of the glmQLFTest or glmLRT function. For example,
 fit <- glmQLFit(y, design)
 qlf <- glmQLFTest(fit, contrast=c(-1,1,0))
 topTags(qlf)
  
will compare B to A. The meaning of the contrast is to make the comparison -1A + 1B + 0*C, which is of course is simply B-A.

I have to admit I don't understand this. In other words, I don't understand the "maths" behind it: is B - A the same as A -B? Does the order of the columns matter in the design? And I'm guessing you shouldn't have an intercept in your design matrix if you need to make contrasts?

I've been trying to practise on some real data given to me and apply this:

I have 9 samples, split into 3 groups: wildtype (WT), knockout1 (KOA), knockout2 (KOB). The aim (given to me) was "we need to compare each KO with WT". Therefore, I made the following design matrix:

# Quick overview of my DGEList $counts
head (y$counts)
#       KOA1  KOA2  KOA3   KOB1   KOB2   KOB3   WT1   WT2   WT3
# gene1   12    24    45     34     45     26    16    20    24
# gene2 8007 14031 13076  12305  14154   8805 15194 18854 12634
# gene3   12    43    37    104     44     44    28    59    18
# gene4  553   759   955    736    631    517   829   894   997
# gene5 1232  2139  2417   2402   2148   1652  1899  2409  2959
# gene6  643   766   826    239    623    532   314   318   787

# Make model matrix
Geno <- factor(substring(colnames(y),1,nchar(colnames(y))-1))
design <- model.matrix(~0+Geno)
design
  GenoKOA   GenoKOB GenoWT
1       1         0      0
2       1         0      0
3       1         0      0
4       0         1      0
5       0         1      0
6       0         1      0
7       0         0      1
8       0         0      1
9       0         0      1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Geno
[1] "contr.treatment"

# Estimate dispersion and fit glm
y <- estimateDisp(y, design, robust = TRUE)
fit <- glmQLFit(y, design, robust = TRUE)

# Make contrast and test
my.contrasts <- makeContrasts(WTvsKOA=GenoKOA-GenoWT, WTvsKOB=GenoKOB-GenoWT, levels = design)

qlf.WTvsKOA <- glmQLFTest(fit, contrast = my.contrasts[,"WTvsKOA"])
qlf.WTvsKOB <- glmQLFTest(fit, contrast = my.contrasts[,"WTvsKOB"])

Both my tests find 0 over-expressed and under-expressed and I am unsure as to why that is. It may be unrelated to my commands (replicates from KOA do not cluster together on my plotMDS graph). But either way, my questions remain: am I making the contrasts correctly?

Any help would be deeply appreciated.

ngs edgeR makeContrasts glm dge • 10k views

ADD COMMENT • link 6.7 years ago by m98 ▴ 440