I'm running into a problem where my coefficients are not estimable when using Bioconductor/limma on a two-color factorial design for a microarray analysis.
I have microarray data I downloaded from Array Express using the ArrayExpress function in the ArrayExpress package. I managed to convert the NChannelSet object into an RGList object that I can use in limma using a tip I found in the archives (https://stat.ethz.ch/pipermail/bioconductor/2009-September/029705.html). After some bg subtraction and normalization, I have an MAList object with 32 arrays (red and green) by 34,944 probes.
The experiment consists of two cell types (WT or KO), by 4 treatments (control, CD70, CD80, CD70+CD80), at four different time points (2, 4, 8, 14 hours), with a dye-swap. I'm mostly interested in genes that are differentially expressed when WT cells are hit with CD70+CD80 versus CD80 alone.
I'm trying to create a targets file so I can use modelMatrix to create my design matrix. But I have a multifactorial design using a two color dye-swap design, and I'm not sure how to specify this in the targets file. The limma manual has information about factorial designs, but no examples for two-color experiments. I have the following factors:
1: Celltype: WT or KO 2: Treatment: control (x), CD70, CD80, or CD70+CD80 (CD7080). 3: Timepoints: 2, 4, 8, and 14 hours.
I tried collapsing all of factors and levels into a single string in my targets file:
> targets
array Cy3 Cy5
1 array3329 WT.CD70.2 WT.x.2
2 array2675 KO.CD7080.8 WT.CD7080.8
3 array2242 WT.CD7080.2 KO.CD7080.2
4 array3328 WT.x.2 WT.CD70.2
5 array3310 WT.x.8 WT.CD70.8
6 array2246 KO.CD7080.2 WT.CD7080.2
7 array3337 WT.CD7080.4 WT.CD80.4
8 array3323 WT.CD7080.14 WT.CD80.14
9 array2673 KO.CD70.8 WT.CD70.8
10 array1938 WT.CD7080.14 KO.CD7080.14
11 array2240 WT.CD70.2 KO.CD70.2
12 array3336 WT.CD80.4 WT.CD7080.4
13 array3322 WT.CD80.14 WT.CD7080.14.2
14 array2674 WT.CD7080.8 KO.CD7080.8
15 array3321 WT.CD70.14 WT.x.14
16 array2241 KO.CD70.2 WT.CD70.2
17 array2597 KO.CD7080.4 WT.CD7080.4
18 array3313 WT.CD7080.8 WT.CD80.8
19 array1939 KO.CD7080.14 WT.CD7080.14.2
20 array3335 WT.CD70.4 WT.x.4
21 array2672 WT.CD70.8 KO.CD70.8
22 array3320 WT.x.14 WT.CD70.14
23 array3334 WT.x.4 WT.CD70.4
24 array3311 WT.CD70.8 WT.x.8
25 array3331 WT.CD7080.2 WT.CD80.2
26 array1941 KO.CD70.14 WT.CD70.14.2
27 array2588 WT.CD70.4 KO.CD70.4
28 array2596 WT.CD7080.4 KO.CD7080.4
29 array3330 WT.CD80.2 WT.CD7080.2
30 array3312 WT.CD80.8 WT.CD7080.8
31 array1940 WT.CD70.14.2 KO.CD70.14
32 array2593 KO.CD70.4 WT.CD70.4
I then created a design matrix, where the reference group is WT cells, untreated, 2-hour timepoint.
design <- modelMatrix(targets, ref="WT.x.2")
This creates a 32x25 design matrix, but when I fit the model, I get a "Coefficients not estimable" error/warning:
> fit <- lmFit(d, design)
Coefficients not estimable: WT.CD70.14.2 WT.CD80.14 WT.CD80.2 WT.CD80.4 WT.CD80.8 WT.x.14 WT.x.4 WT.x.8
Warning message:
Partial NA coefficients for 34944 probe(s)
Any help with how I can specify this multifactorial time-course design in a two-channel dye-swap experiment would be greatly appreciated. As I said, I'm most interested in the conditions on arrays 8-9, 13-14, 19, 26, and 30-31, where I'm looking at WT cells treated with CD70 and CD80 versus CD80 alone.
Thanks in advance for any help!