I am analyzing publicly available microarray data and am using the log2fold data already uploaded on GEO from GSE9776. These are 2 channel experiments with 17 isolates and 6 conditions of which I'm interested in the first 6 isolates with 2 conditions. The conditions are antibiotic at 2 hours ("INH 2hr") and at 6 hours ("INH 6hr"). They all have the same control (water) in Cy3
I created a design matrix with INH2hr as the intercept and INH6hr at first coefficient, with the remaining coefficients being assigned to conditions I'm not interested in. My understanding is I should leave thse other conditions in the calculations at the variance in those samples are important in the calculation.
group <- factor(GSE9776@phenoData@data$source_name_ch1,) design <- model.matrix(~group) design colnames(design) <- c("INH2hr", "INH6hr", "KatG_ko", "INH_nutrient", "INH_O2", "hollow_fbr") fit_GSE9776 <- lmFit(GSE9776_filtered, design)
Based on my design, INH2hr is going to be the intercept, and each of the other conditions will be assigned a coefficient.
I'd like to look at the differences in gene expression across the following : 1. INH 2hr vs control 2. INH 6hr vs control 3. INH 6hr vs INH 2hr
Based on my reading of the vignette, to obtain the INH6hr vs control I should use coefficient = "INH6hr" . How do I obtain INH6hr vs INH2hr and INH2hr vs ref? Should I be using a contrast matrix? If so, how?
Thanks in advance,