Question

How to do hierarchical clustering for colums in python or excel?

0

Entering edit mode

2.2 years ago

Xylanaser ▴ 80

hey

How to do hierarchical clustering for colums in python or excel? I mean similar columns are next to each other?

example

1 2 3 4

a 0 0 1 2

b 1 1 1 1

c 1 2 0 1

d 3 1 0 1

|

v

1 2 4 3

a 0 0 2 1

b 1 1 1 1

c 1 2 1 0

d 3 1 1 0

clustering excel python offtopic • 737 views

ADD COMMENT • link updated 2.2 years ago by Wayne ★ 2.0k • written 2.2 years ago by Xylanaser ▴ 80

0

Entering edit mode

https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering

ADD REPLY • link 2.2 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

This post does not fit the theme of this forum.

ADD REPLY • link 2.2 years ago by Ram 43k

0

Entering edit mode

Go to my cluster_analysis-binder repo and hit the launch binder badge. A temporary active Jupyter session will spin up served via MyBinder.org.

Best to start at the top of the first notebook listed under the available notebooks, 'Hierarchical clustering', and work through to the section 'Demo of Hierarchical Clustering a Correlation matrix from the data (single-pass) using SciPy'. It looks like you'd be most interested in where I make df_clustered.
Looking back over it, I see I didn't leave good notes there; however, the header of the notebook describes that it is really just reworking https://github.com/TheLoneNut/CorrelationMatrixClustering/blob/master/CorrelationMatrixClustering.ipynb with nicer visualization by using Seaborn and only viewing the lower triangle, with the impetus for that being outlined in the top of How To Make Lower Triangle Heatmap with Correlation Matrix in Python?:

"Since correlation matrix is symmetric, it is redundant to visualize the full correlation matrix as a heat map. Instead, visualizing just lower or upper triangular matrix of correlation matrix is more useful."

The other available notebooks explore the use of various modules and different algorithms to do the clustering. So depending on your data, you may want to dive further into those. I even have some code worked out for comparing the clustering result assignments from various approaches/implementations.

Short/Quick Start version:
Static rendering of the notebook that I first point you towards is here.

ADD REPLY • link 2.2 years ago by Wayne ★ 2.0k