Entering edit mode
2.2 years ago
Xylanaser
▴
80
hey
How to do hierarchical clustering for colums in python or excel? I mean similar columns are next to each other?
example
1 2 3 4
a 0 0 1 2
b 1 1 1 1
c 1 2 0 1
d 3 1 0 1
|
v
1 2 4 3
a 0 0 2 1
b 1 1 1 1
c 1 2 1 0
d 3 1 1 0
https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering
This post does not fit the theme of this forum.
Go to my cluster_analysis-binder repo and hit the
launch binder
badge. A temporary active Jupyter session will spin up served via MyBinder.org.Best to start at the top of the first notebook listed under the available notebooks, 'Hierarchical clustering', and work through to the section 'Demo of Hierarchical Clustering a Correlation matrix from the data (single-pass) using SciPy'. It looks like you'd be most interested in where I make
df_clustered
.Looking back over it, I see I didn't leave good notes there; however, the header of the notebook describes that it is really just reworking https://github.com/TheLoneNut/CorrelationMatrixClustering/blob/master/CorrelationMatrixClustering.ipynb with nicer visualization by using Seaborn and only viewing the lower triangle, with the impetus for that being outlined in the top of How To Make Lower Triangle Heatmap with Correlation Matrix in Python?:
The other available notebooks explore the use of various modules and different algorithms to do the clustering. So depending on your data, you may want to dive further into those. I even have some code worked out for comparing the clustering result assignments from various approaches/implementations.
Short/Quick Start version:
Static rendering of the notebook that I first point you towards is here.