Question: Impact Of Microarray Gene Expression Normalization On Clustering
gravatar for elb83
7.0 years ago by
elb8380 wrote:

Hi guys I have a question about microarray gene expression normalization techniques and clustering. I have a gene expression microarray matrix of around 13.000 genes (the rows) and 200 samples (the columns). I normalized the matrix using RMA (that gives the values in log2 scale) and then I clusterized it (the samples and the genes) using the pearson correlation and "average linkage" for HCL. The genes and the samples clusterize very well! If I repeat the normalization but now using MAS5 (and then I log2 transform the data) and again if I clusterized using the same criteria as above, the genes and the samples do not cluster anymore!!!!! I tried to center the genes and the samples, that is for each row (gene) I subtracted the median value across the samples both after RMA and after Mas5 normalization but again the genes and the smples clusterize very well using RMA but not using Mas5. Then, for each gene (row) I computed the median across all samples and after RMA normalization the distribution of the median of the genes across the samples is Normal (as from Shapiro test) while after Mas5 it is not Normal. Can this aspect affect the quality of the clustering? Why this great difference using the two methods?

ADD COMMENTlink modified 7.0 years ago by Neilfws48k • written 7.0 years ago by elb8380

What do you mean by cluster well? Are you getting more clusters with one versus the other? Are you getting better cluster densities? Are you getting better cluster separation? Do the clusters make more sense biologically?

ADD REPLYlink written 7.0 years ago by Damian Kao15k

Hi Damian! The genes and samples group well together. In other words with RMA I get better cluster separation!

ADD REPLYlink written 7.0 years ago by elb8380
gravatar for Sean Davis
7.0 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

Clustering is probably not really the best method to evaluate normalization. You might take a look at boxplots, density plots, and some MA plots, if necessary. RMA is probably the better normalization for most situations, though. Finally, be sure that you are comfortable with the quality of the data before proceeding too far.

ADD COMMENTlink written 7.0 years ago by Sean Davis26k

Hi Sean! Thank you a lot for suggestions!

ADD REPLYlink written 7.0 years ago by elb8380
gravatar for Neilfws
7.0 years ago by
Sydney, Australia
Neilfws48k wrote:

Agree with Sean that clustering is not a good measure of normalization.

To address the issue of why RMA and MAS5 give different clusters. It is not so surprising, when you consider that the former values are log2-transformed whereas the latter are not. Log transformation has the effect of "squashing" values closer together, which will result in smaller values in the distance matrix and hence, "tighter" clusters.

ADD COMMENTlink written 7.0 years ago by Neilfws48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1083 users visited in the last hour