Question

Impact Of Microarray Gene Expression Normalization On Clustering

2

Entering edit mode

12.2 years ago

elb83 ▴ 80

Hi guys I have a question about microarray gene expression normalization techniques and clustering. I have a gene expression microarray matrix of around 13.000 genes (the rows) and 200 samples (the columns). I normalized the matrix using RMA (that gives the values in log2 scale) and then I clusterized it (the samples and the genes) using the pearson correlation and "average linkage" for HCL. The genes and the samples clusterize very well! If I repeat the normalization but now using MAS5 (and then I log2 transform the data) and again if I clusterized using the same criteria as above, the genes and the samples do not cluster anymore!!!!! I tried to center the genes and the samples, that is for each row (gene) I subtracted the median value across the samples both after RMA and after Mas5 normalization but again the genes and the smples clusterize very well using RMA but not using Mas5. Then, for each gene (row) I computed the median across all samples and after RMA normalization the distribution of the median of the genes across the samples is Normal (as from Shapiro test) while after Mas5 it is not Normal. Can this aspect affect the quality of the clustering? Why this great difference using the two methods?

microarray gene-expression clustering normalization • 5.5k views

ADD COMMENT • link updated 12.2 years ago by Neilfws 49k • written 12.2 years ago by elb83 ▴ 80

0

Entering edit mode

What do you mean by cluster well? Are you getting more clusters with one versus the other? Are you getting better cluster densities? Are you getting better cluster separation? Do the clusters make more sense biologically?

ADD REPLY • link 12.2 years ago by Damian Kao 16k

0

Entering edit mode

Hi Damian! The genes and samples group well together. In other words with RMA I get better cluster separation!

ADD REPLY • link 12.2 years ago by elb83 ▴ 80

score 3 · Answer 1 · 2013-04-09

3

Entering edit mode

12.2 years ago

Sean Davis 27k

Clustering is probably not really the best method to evaluate normalization. You might take a look at boxplots, density plots, and some MA plots, if necessary. RMA is probably the better normalization for most situations, though. Finally, be sure that you are comfortable with the quality of the data before proceeding too far.

ADD COMMENT • link 12.2 years ago by Sean Davis 27k

0

Entering edit mode

Hi Sean! Thank you a lot for suggestions!

ADD REPLY • link 12.2 years ago by elb83 ▴ 80

score 1 · Answer 2 · 2013-04-09

Agree with Sean that clustering is not a good measure of normalization.

To address the issue of why RMA and MAS5 give different clusters. It is not so surprising, when you consider that the former values are log2-transformed whereas the latter are not. Log transformation has the effect of "squashing" values closer together, which will result in smaller values in the distance matrix and hence, "tighter" clusters.