Comparing hierarchical clusterings
1
0
Entering edit mode
6.3 years ago
nimrodrap • 0

Hi, I am doing hierarchical clustering on TCGA data from several tissues. I perform the hierarchical clustering several times, based on different omics data, and would like to choose the best hierarchical clustering. I am looking for a hierarchical equivalent to looking at cluster homogeneity / separation / silhouette score that is used when evaluating non-hierarchical clustering solutions. Is anyone familiar with such generalizatoin of homogeneity / separation / silhouette score to tree structures?

clustering cancer TCGA • 2.3k views
ADD COMMENT
0
Entering edit mode

Try apcluster, I can almost promise to you that no matter the data, it will give you the best results. I'm quite surprised that it hasn't become de facto standard in all things bioinfo..

ADD REPLY
0
Entering edit mode

Thanks, but this is a clustering algorithm. I am looking for ways to evaluate a given hierarchical clustering. I don't see that apcluster has any such evaluation.

ADD REPLY
0
Entering edit mode

how about rand indexing for each cluster(ing) and then comparing the scores? implemented in clusteval R package.

ADD REPLY
1
Entering edit mode
6.3 years ago

pvclust

The best approach to take for this would be by bootstrapping the clustering step, as per pvclust. This, through bootstrapping, derives probabilities for each branch point in your dendrogram. I aso put some short code here on how you can do that (but for an unrelated topic): A: how to make bootstrapped tree in PVCLUST package with SNP genotyping data?

dendextend

For other types of comparisons, take a look at dendextend: Comparing two dendrograms. In particular look at the very simplistic entanglement metric


You have not indicated that you're specifically looking for ways to determine the ideal cluster solution to a dataset, for which there are many other methods, some which you have already mentioned.

ADD COMMENT
0
Entering edit mode

Thanks. dendextend is for comparing two dendrograms, which will give me the distances between the different solutions, but won't indicate which one is better. pvclust does indeed give a score for each subtree, but it is not clear how to combine all these scores to give a final score for the tree. But it is helpful nonetheless.

ADD REPLY

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6