I'm new to microarray experiments - have no experience and trying to get a grip. I am using GDS596 (well known Su et al 2004 PNAS data), and trying to get a single expression value for each gene. Essentially, I am looking to replicate the analysis done in Tartaglia et al "Life on the Edge..." TRENDS Biochem Sci 32(5), 2007.
I have obtained the raw data human *.CEL files, and would like some clarification on the steps taken. I have a few questions that come up below.
1.) MAS5 normalization (for background correction via R affy package) - change to take log10 of these values, and then average across genes and across experiments. Fine (also can use rma and gcrma).
2.) The authors then "median scale followed by quantile normalization". So, scaling across experiments (i.e. GSM columns) allows us to make comparisons between experiments. Fine. Although I don't scale row-wise as some other papers do (not sure on why you would do this?).
3.) Then, quantile normalization? Why is this step taken? I had thought that this was done at the probe level. If intensities are normalized (MAS5), and corrected for across experiments (median scaling), why another normalization?
It seems that I find 'ok' correlation (pearson's rho ~.77) with the paper's expression values after first 2 steps, but then quantile normalization screws everything up. Are there obvious things I'm doing wrong here?