Model Matrix Design - Replicates Within and Between Batches
1
0
Entering edit mode
5.8 years ago

Hello all,

I'm trying to combine two datasets for analysis and had some questions about setting up the design matrix. I'm trying to account for differences between two batches, two conditions, and technical replicates. The target list looks something like the following:

Sample     Condition     Batch     Individual
1           1            1           1
2           1            1           2
3           1            1           3
4           1            1           4
5           1            1           5
6           1            1           6
7           1            1           7
8           1            1           8
9           1            2           1
10          1            2           1
11          1            2           2
12          1            2           2
13          2            2           9
14          2            2           9
15          2            2           10
16          2            2           10
17          2            2           11
18          2            2           11


As you can see, there are both technical replicates within batches (all individuals in batch 2 have replicates), as well as replicates between batches (individuals 1 and 2 are present in both batches).

I can't create a model matrix as is, presumably because the variables are not linearly independent. Is there any way to develop a consistent model matrix to account for all these variables, given that not all individuals have replicates?

deseq model model matrix replicate • 1.7k views
0
Entering edit mode
5.8 years ago
russhh 5.6k

With targets as defined:

targets.df <- data.frame(
Sample = 1:18,
Condition = c(rep(1, 12), rep(2, 6)),
Batch = c(rep(1, 8), rep(2, 10)),
Individual = c(1:8, rep(c(1,2,9,10,11), each = 2))
)


It should be do-able (I'm assuming: no individual is exposed to both conditions, no individual is replicated within batch1). But it needs a hacked together design matrix:

You need binary columns for: i) Intercept; ii) Condition2; iii) Batch1 Then you need a binary column for any individual who is present in both of the batches. Then you need a binary column for all but one of the remaining samples in batch2

So for the targets data.frame you've posted (apologies for the ugly code),

design <- with(targets.df,
data.frame(
intercept = 1,
cond2 = ifelse(Condition == 2, 1, 0),
batch1 = ifelse(Batch == 1, 1, 0),
match1 = ifelse(Individual == 1, 1, 0),
match2 = ifelse(Individual == 2, 1, 0),
match9 = ifelse(Individual == 9, 1, 0),
match10 = ifelse(Individual == 10, 1, 0)
))

Matrix::rankMatrix(design) # 7
> design
intercept cond2 batch1 match1 match2 match9 match10
1          1     0      1      1      0      0       0
2          1     0      1      0      1      0       0
3          1     0      1      0      0      0       0
4          1     0      1      0      0      0       0
5          1     0      1      0      0      0       0
6          1     0      1      0      0      0       0
7          1     0      1      0      0      0       0
8          1     0      1      0      0      0       0
9          1     0      0      1      0      0       0
10         1     0      0      1      0      0       0
11         1     0      0      0      1      0       0
12         1     0      0      0      1      0       0
13         1     1      0      0      0      1       0
14         1     1      0      0      0      1       0
15         1     1      0      0      0      0       1
16         1     1      0      0      0      0       1
17         1     1      0      0      0      0       0
18         1     1      0      0      0      0       0


I'd strongly urge you to test this design though.