How to select the number of clusters once I have the HCL (average linkeage) tree
Entering edit mode
18 months ago


I have just clustered a list of 926 genes (with differential expression between two groups) with the Genesis 1.8.1 program and I use the Weighted pair-group average linkage (WPGMA) clustering algorithm. I automatically obtain a heatmap with the tree or dendrogram. Once here, I can manually select the clusters but how do I decide which node to cut into? Should I try to make the clusters similar in size or cut by a similar node height?

I have seen questions where you want to know the number of clusters before classification, but as this is an unsupervised method here I get the tree and the choice of clusters is made later. Is there any rule that would allow me to select the clusters in a more or less objective way?

In addition and for a correct clustering, should I analyze both groups (sample groups) separately? Each group has different levels of expression (that is why they are proteins with differential expression), and this could make it difficult for the algorithm to build clusters, isn't it?

I attach the programme manuals:

Thank you very much for your help

hierarchical clustering gene co-expression • 477 views
Entering edit mode

I cannot comment on this antique program, and I actually recommend to use something a bit more recent simply because it might be hard to reproduce your results created with such old software, it might not run on modern PCs, who knows. At best use R or any other programming language to actually script your analysis in a reproducible fashion, and in R it is the cutree command you are looking for that you can run on your hclust object. Packages such as pheatmap and ComplexHeatmap have good documentation if you want to get started.


Login before adding your answer.

Traffic: 685 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6