I am figuring out to how construct a design matrix for a multi variate experiment.
I am using edgeR to analyse the gene expression data of 32 different samples. There are 16 gene knock-outs (KO) and16 wildtype controls (WT). The experiment is constructed with two time points ( 1h and 24h) and with two treatments ( TGF-beta and control). The research question is which genes are DE between the KO and the WT at the different conditions. So I will compare each sample of TGF treated separately and each control separately in the different time points. For doing this I first constructed a design matrix without intercept since I didn't have any reasonable baseline to compare with. For example I wanted to look at the expression at 24 hours in either TGF or without. The design matrix:
my.design <- model.matrix(~ 0 + design, data=matrix)
designKO1hControl designKO1hTGF designKO24hControl designKO24hTGF designWT1hControl designWT1hTGF designWT24hControl designWT24hTGF
1 0 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 1
3 0 0 0 0 1 0 0 0
4 0 0 0 0 0 0 1 0
5 0 1 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0
7 1 0 0 0 0 0 0 0
8 0 0 1 0 0 0 0 0
9 0 0 0 0 0 1 0 0
10 0 0 0 0 0 0 0 1
11 0 0 0 0 1 0 0 0
12 0 0 0 0 0 0 1 0
13 0 1 0 0 0 0 0 0
14 0 0 0 1 0 0 0 0
15 1 0 0 0 0 0 0 0
16 0 0 1 0 0 0 0 0
17 0 0 0 0 0 1 0 0
18 0 0 0 0 0 0 0 1
19 0 0 0 0 1 0 0 0
20 0 0 0 0 0 0 1 0
21 0 1 0 0 0 0 0 0
22 0 0 0 1 0 0 0 0
23 1 0 0 0 0 0 0 0
24 0 0 1 0 0 0 0 0
25 0 0 0 0 0 1 0 0
26 0 0 0 0 0 0 0 1
27 0 0 0 0 1 0 0 0
28 0 0 0 0 0 0 1 0
29 0 1 0 0 0 0 0 0
30 0 0 0 1 0 0 0 0
31 1 0 0 0 0 0 0 0
32 0 0 1 0 0 0 0 0
So the matrix is modelled after each condition to be able to compare the difference between them.
My problem comes now since I have 4 biological replicates of the KO and WT. Each were knocked-out individually and it shows on the MDS-plot that the replicates group together. I would like to do like in the edgeR handbook chapter 3.4.3. Where they remove the batch effect in these replicates. So I constructed a similar design matrix:
my.design <- model.matrix(~ batcheffect + design, data=matrix)
This, as I understand, would use one of the factors as intercept, thus comparing all other samples to that, which I would like to avoid. It would be of great help if someone could help me solving this. Or if you have a better suggestion organizing the design matrix that too would be of great help.