EDIT 2011-06-21: In the original version of this question, I misstated that the Cy3 channel contained a measure of a common reference from pooled controls. Instead, the Cy3 channels are actually a measure of the non-exposed culture from the same animal.
This question is very similar to an unanswered question asked on the Bioconductor mailing list. (The only response I found was unhelpful.) I will re-cast it in the particular problem I'm trying to solve.
GSE13465 contains experiments on two organisms:
I only care about data from the rat cultures. (The human data is present due to submitter error.)
Three biological replicates were constructed and measured.
For each biological replicate, the rat cultures took place in two media types:
- rat hepatocytes in normal media
- rat hepatocytes in modified media
Each media condition was paired with an acetaminophen (APAP) treatment condition:
- no exposure (0 mM APAP), the control
- exposure to 5 mM APAP
- exposure to 10 mM APAP
Expression levels were measured using the two-channel Agilent 011868 Rat Oligo Microarray G4130A (GPL890). All gene expression measurements were performed with the labels as such:
- Channel 1 (Cy5): 0 mM APAP (control) or 5 or 10 mM APAP (treatment)
- Channel 2 (Cy3): 0 mM APAP (control)
I would like to perform the following contrasts:
- 5 mM APAP vs. control, standard media
- 10 mM APAP vs. control, standard media
- 5 mM APAP vs. control, modified media
- 10 mM APAP vs. control, modified media
Using GEOquery, I obtained the gene expression data for the GEO Series
> library(GEOquery) > library(limma) > gse13465 <- getGEO("GSE13465", GSEMatrix = TRUE) Found 2 file(s) GSE13465-GPL887_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1254k 100 1254k 0 0 302k 0 0:00:04 0:00:04 --:--:-- 315k File stored at: /tmp/Rtmp2VCw8p/GPL887.soft GSE13465-GPL890_series_matrix.txt.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1604k 100 1604k 0 0 452k 0 0:00:03 0:00:03 --:--:-- 474k File stored at: /tmp/Rtmp2VCw8p/GPL890.soft > ratdata <- gse13465[] # The rat data is in the second experiment > ratdata ExpressionSet (storageMode: lockedEnvironment) assayData: 20501 features, 18 samples element names: exprs protocolData: none phenoData sampleNames: GSM339437 GSM339438 ... GSM339454 (18 total) varLabels: title geo_accession ... data_row_count (42 total) varMetadata: labelDescription featureData featureNames: 3 4 ... 22572 (20501 total) fvarLabels: ID COL ... SEQUENCE (20 total) fvarMetadata: Column Description labelDescription experimentData: use 'experimentData(object)' Annotation: GPL890
I'm uncertain how to procede from here, however. The expression values that are in the ExpressionSet for the rat data are described by the submitters as the "Lowess [sic] normalized log 2 ratio (Cy5/Cy3); signal calculated using Imagene 5.0 and GeneSight 4.1 software (Biodiscovery, Palo Alto, CA)." I do not really understand how this relates to an MAList, which is the usual input for tha
lmFit function. I tried the obvious, but it did not work:
> MA <- as(ratdata, "MAList") Error in as(ratdata, "MAList") : no method or default for coercing "ExpressionSet" to "MAList"
I'm also not completely certain for how to set up the fit and contrast matrices; in this case there are not measurements from each channel but rather a single value per probeset representing the log-2 ratio.
I would really appreciate any help in performing this contrast, both from the standpoint of helping my research but also just wrapping my head around R/Bioconductor, and limma and GEOquery in particular.