Question: Comparing hierarchical clusterings
gravatar for nimrodrap
2.4 years ago by
nimrodrap0 wrote:

Hi, I am doing hierarchical clustering on TCGA data from several tissues. I perform the hierarchical clustering several times, based on different omics data, and would like to choose the best hierarchical clustering. I am looking for a hierarchical equivalent to looking at cluster homogeneity / separation / silhouette score that is used when evaluating non-hierarchical clustering solutions. Is anyone familiar with such generalizatoin of homogeneity / separation / silhouette score to tree structures?

cancer clustering tcga • 1.1k views
ADD COMMENTlink modified 2.4 years ago by Kevin Blighe59k • written 2.4 years ago by nimrodrap0

Try apcluster, I can almost promise to you that no matter the data, it will give you the best results. I'm quite surprised that it hasn't become de facto standard in all things bioinfo..

ADD REPLYlink written 2.4 years ago by 5heikki8.7k

Thanks, but this is a clustering algorithm. I am looking for ways to evaluate a given hierarchical clustering. I don't see that apcluster has any such evaluation.

ADD REPLYlink written 2.4 years ago by nimrodrap0

how about rand indexing for each cluster(ing) and then comparing the scores? implemented in clusteval R package.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011213k
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe59k
Kevin Blighe59k wrote:


The best approach to take for this would be by bootstrapping the clustering step, as per pvclust. This, through bootstrapping, derives probabilities for each branch point in your dendrogram. I aso put some short code here on how you can do that (but for an unrelated topic): A: how to make bootstrapped tree in PVCLUST package with SNP genotyping data?


For other types of comparisons, take a look at dendextend: Comparing two dendrograms. In particular look at the very simplistic entanglement metric

You have not indicated that you're specifically looking for ways to determine the ideal cluster solution to a dataset, for which there are many other methods, some which you have already mentioned.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Kevin Blighe59k

Thanks. dendextend is for comparing two dendrograms, which will give me the distances between the different solutions, but won't indicate which one is better. pvclust does indeed give a score for each subtree, but it is not clear how to combine all these scores to give a final score for the tree. But it is helpful nonetheless.

ADD REPLYlink written 2.4 years ago by nimrodrap0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1449 users visited in the last hour