In the section "Cluster Analysis" from the book Bioinformatics and Computational Biology Solutions using R and Bioconductor, HOPACH clustering is described. HOPACH stands for Hierarchical Ordered Partioning and Collapsing Hybrid. As its name suggests, this is a hybrid method of partitioning and hierarchical cluster analysis, which recursively alternates splitting and collapsing steps based on a criterion called median split silhouhette (MSS). MSS is a measure aiming at answering the question : shoud I split this (sub-)cluster again or is it homogeneous enough ? The silhouette of a gene is a measure indicating how well this gene fits into its own cluster comparing to other clusters. So maximizing the MSS criterion allows one to decide when splitting should be stopped and then remaining unsplit clusters are "what you are looking for".
There is a R (bioconductor) package available : hopach.
As well as the package, you will also find a detailled manuscript (pdf) of the methodology.
This package provides a bootstrap resampling function allowing one to obtain membership estimates for a gene in each cluster.
Here are useful references associated with this package/method :
Van der Laan and Pollard. Hybrid clustering of gene expression data with visualization and the bootstrap. 2003. Journal of Statistical Planning and Inference.
K.Pollard and M. van der Laan. A method to identify significant clusters in gene expression data. In SCI2002 Proceedings, volume II, pages 318-325, Orlando, 2002a. International Institute of Informatics and Systemics.
K.Pollard and M. van der Laan. Statistical inference for simultaneous clustering of gene expression data. Mathematical Biosciences, 176(1):99-121, 2002b.
This may be a starting point to get what you want, that is to say highly correlated or significantly coexpressed genes.
Hope it helps.
I was thinking that this package should be interesting.
(I do not have personally a strong experience with this package)
10.8 years ago by
toni • 2.2k