This question is related to a question I asked here earlier. Since then, I have gotten the raw data and processed it into normalized MA values. I'll repeat the background and motivation below.
The study GSE13465 contains experiments of toxic effects of acetaminophen on two organisms:
I only care about data from the rat cultures. (The human data is present due to submitter error.)
The rat cultures took place in two media types:
- rat hepatocytes in normal media
- rat hepatocytes in modified media
Each media condition was paired with an acetaminophen (APAP) treatment condition for each biological replicate:
- no exposure (0 mM APAP), the control
- exposure to 5 mM APAP
- exposure to 10 mM APAP
Expression levels were measured for each biological replicate-medium combination using the two-channel Agilent 011868 Rat Oligo Microarray G4130A ([GPL890]). All gene expression measurements were performed with the labels as such:
- Channel 1 (Cy5): 0 mM APAP (control) or 5 or 10 mM APAP (treatment)
- Channel 2 (Cy3): 0 mM APAP (control)
This experiment design was repeated for three biological replicates (rats 2, 3, and 4).
I would like to perform the following contrasts:
- 5 mM APAP vs. control, standard media
- 10 mM APAP vs. control, standard media
- 5 mM APAP vs. control, modified media
- 10 mM APAP vs. control, modified media
My challenge is the following: there was no single mRNA sample common across all the microarray experiments within a medium.
From what I understand, ideally the researchers should have pooled the mRNA from each 0 mM APAP standard medium culture, then labeled this pool with Cy3 and compared the pooled mRNA to each individual culture's mRNA labeled with Cy5. (They would have repeated this design for the modified medium, as well.) Unfortunately, this was not what the researchers did.
Instead, within each biological replicate, the researchers used the mRNA from the untreated sample from the same rat as a reference on the Cy3 channel for that rat's cultures in that medium, and only that rat's cultures. For example, in experiment GSM339438, Rat 2's cells exposed to 5 mM APAP in standard medium were compared to Rat 2's cells exposed to no APAP in standard medium.
A look at a subset of my
targets may be helpful here:
> targets[targets$Medium == "standard",] FileNameCy3 FileNameCy5 Std_APAP0.vs.Std_APAP0.2 GSM339437_Cy3_36569.txt GSM339437_Cy5_36569.txt Std_APAP5.vs.Std_APAP0.2 GSM339438_Cy3_36570.txt GSM339438_Cy5_36570.txt Std_APAP10.vs.Std_APAP0.2 GSM339439_Cy3_36571.txt GSM339439_Cy5_36571.txt Std_APAP0.vs.Std_APAP0.3 GSM339443_Cy3_36630.txt GSM339443_Cy5_36630.txt Std_APAP5.vs.Std_APAP0.3 GSM339444_Cy3_36631.txt GSM339444_Cy5_36631.txt Std_APAP10.vs.Std_APAP0.3 GSM339445_Cy3_36632.txt GSM339445_Cy5_36632.txt Std_APAP0.vs.Std_APAP0.4 GSM339449_Cy3_37372.txt GSM339449_Cy5_37372.txt Std_APAP5.vs.Std_APAP0.4 GSM339450_Cy3_37373.txt GSM339450_Cy5_37373.txt Std_APAP10.vs.Std_APAP0.4 GSM339451_Cy3_37374.txt GSM339451_Cy5_37374.txt GEOSample Organism Donor Medium Cy3APAP Std_APAP0.vs.Std_APAP0.2 GSM339437 Rattus norvegicus 2 standard 0 Std_APAP5.vs.Std_APAP0.2 GSM339438 Rattus norvegicus 2 standard 0 Std_APAP10.vs.Std_APAP0.2 GSM339439 Rattus norvegicus 2 standard 0 Std_APAP0.vs.Std_APAP0.3 GSM339443 Rattus norvegicus 3 standard 0 Std_APAP5.vs.Std_APAP0.3 GSM339444 Rattus norvegicus 3 standard 0 Std_APAP10.vs.Std_APAP0.3 GSM339445 Rattus norvegicus 3 standard 0 Std_APAP0.vs.Std_APAP0.4 GSM339449 Rattus norvegicus 4 standard 0 Std_APAP5.vs.Std_APAP0.4 GSM339450 Rattus norvegicus 4 standard 0 Std_APAP10.vs.Std_APAP0.4 GSM339451 Rattus norvegicus 4 standard 0 Cy5APAP Cy3 Cy5 Std_APAP0.vs.Std_APAP0.2 0 Std_APAP0 Std_APAP0 Std_APAP5.vs.Std_APAP0.2 5 Std_APAP0 Std_APAP5 Std_APAP10.vs.Std_APAP0.2 10 Std_APAP0 Std_APAP10 Std_APAP0.vs.Std_APAP0.3 0 Std_APAP0 Std_APAP0 Std_APAP5.vs.Std_APAP0.3 5 Std_APAP0 Std_APAP5 Std_APAP10.vs.Std_APAP0.3 10 Std_APAP0 Std_APAP10 Std_APAP0.vs.Std_APAP0.4 0 Std_APAP0 Std_APAP0 Std_APAP5.vs.Std_APAP0.4 5 Std_APAP0 Std_APAP5 Std_APAP10.vs.Std_APAP0.4 10 Std_APAP0 Std_APAP10
Can I still consider the 0 mM APAP conditions as one common reference, even though each 0 mM APAP sample is actually only a common mRNA reference within the same rat ("
Donor")? In other words, does the code below build a limma design model that is valid, or is the model invalid?
> std.targets <- targets[targets$Medium == "standard",] > std.design <- modelMatrix(std.targets, ref="Std_APAP0") > std.design Std_APAP10 Std_APAP5 Std_APAP0.vs.Std_APAP0.2 0 0 Std_APAP5.vs.Std_APAP0.2 0 1 Std_APAP10.vs.Std_APAP0.2 1 0 Std_APAP0.vs.Std_APAP0.3 0 0 Std_APAP5.vs.Std_APAP0.3 0 1 Std_APAP10.vs.Std_APAP0.3 1 0 Std_APAP0.vs.Std_APAP0.4 0 0 Std_APAP5.vs.Std_APAP0.4 0 1 Std_APAP10.vs.Std_APAP0.4 1 0 > # Line below includes a dye effect in the model. > # NOTE: Apparently we can't check for a dye effect, even though the > # self-self hybridization of APAP0s should tell this AFAIK. Uncomment > # to see this code break. > #std.design <- cbind(DyeEffect=1, std.design) > std.fit <- lmFit(MA.std.genes.only, std.design) > std.fit <- eBayes(std.fit) > # I'm going to assume the Std_APAP5 and Std_APAP10 are my contrasts > # against Std_APAP0.
The control mRNA samples are from biological replicates, but not identical across arrays. Is this sufficient? If not, what sort of design and contrast matrices should I develop?
For example, should I consider this to be like a paired-samples type study (Section 8.3 in the limma User's Guide, page 43). In this case, I would create a design matrix using the factors of rat ("
Donor") and treatment ("
Cy5APAP"), such as follows:
> std.targets <- targets[targets$Medium == "standard",] > std.rats <- factor(std.targets$Donor) > std.rats  2 2 2 3 3 3 4 4 4 Levels: 2 3 4 > std.treat <- factor(std.targets$Cy5APAP) > std.treat  0 5 10 0 5 10 0 5 10 Levels: 0 5 10 > std.design <- model.matrix(~std.rats+std.treat) > rownames(std.design) <- rownames(std.targets) > std.design (Intercept) std.rats3 std.rats4 std.treat5 Std_APAP0.vs.Std_APAP0.2 1 0 0 0 Std_APAP5.vs.Std_APAP0.2 1 0 0 1 Std_APAP10.vs.Std_APAP0.2 1 0 0 0 Std_APAP0.vs.Std_APAP0.3 1 1 0 0 Std_APAP5.vs.Std_APAP0.3 1 1 0 1 Std_APAP10.vs.Std_APAP0.3 1 1 0 0 Std_APAP0.vs.Std_APAP0.4 1 0 1 0 Std_APAP5.vs.Std_APAP0.4 1 0 1 1 Std_APAP10.vs.Std_APAP0.4 1 0 1 0 std.treat10 Std_APAP0.vs.Std_APAP0.2 0 Std_APAP5.vs.Std_APAP0.2 0 Std_APAP10.vs.Std_APAP0.2 1 Std_APAP0.vs.Std_APAP0.3 0 Std_APAP5.vs.Std_APAP0.3 0 Std_APAP10.vs.Std_APAP0.3 1 Std_APAP0.vs.Std_APAP0.4 0 Std_APAP5.vs.Std_APAP0.4 0 Std_APAP10.vs.Std_APAP0.4 1 attr(,"assign")  0 1 1 2 2 attr(,"contrasts") attr(,"contrasts")$std.rats  "contr.treatment" attr(,"contrasts")$std.treat  "contr.treatment" > std.fit <- lmFit(MA.std.genes.only, std.design) > std.fit <- eBayes(std.fit) > # Is this correct that I don't need a contrasts matrix; that > # std.treat5 is representing the contrast > # "Std_APAP5 vs. Std_APAP0?"
I take it in this design, the particular rat is being considered part of the model, with Rat 2 as the baseline, and then the treatment with APAP is considered an additional factor on top of the particular rat. It's not obvious to me that this is a correct approach either, though.
Any thoughts and suggestions on the appropriate design and contrast matrices would be extremely helpful.
EDIT 2011-06-27: Added "Paired Samples" design.