Hi guys I have a question about microarray gene expression normalization techniques and clustering. I have a gene expression microarray matrix of around 13.000 genes (the rows) and 200 samples (the columns). I normalized the matrix using RMA (that gives the values in log2 scale) and then I clusterized it (the samples and the genes) using the pearson correlation and "average linkage" for HCL. The genes and the samples clusterize very well! If I repeat the normalization but now using MAS5 (and then I log2 transform the data) and again if I clusterized using the same criteria as above, the genes and the samples do not cluster anymore!!!!! I tried to center the genes and the samples, that is for each row (gene) I subtracted the median value across the samples both after RMA and after Mas5 normalization but again the genes and the smples clusterize very well using RMA but not using Mas5. Then, for each gene (row) I computed the median across all samples and after RMA normalization the distribution of the median of the genes across the samples is Normal (as from Shapiro test) while after Mas5 it is not Normal. Can this aspect affect the quality of the clustering? Why this great difference using the two methods?
Clustering is probably not really the best method to evaluate normalization. You might take a look at boxplots, density plots, and some MA plots, if necessary. RMA is probably the better normalization for most situations, though. Finally, be sure that you are comfortable with the quality of the data before proceeding too far.
Agree with Sean that clustering is not a good measure of normalization.
To address the issue of why RMA and MAS5 give different clusters. It is not so surprising, when you consider that the former values are log2-transformed whereas the latter are not. Log transformation has the effect of "squashing" values closer together, which will result in smaller values in the distance matrix and hence, "tighter" clusters.