Hierarchical Clustering
Hierarchical clustering builds nested clusters by either merging or splitting them successively. It is especially useful when the number of clusters is not known in advance, and it provides a tree-like structure (dendrogram) that visually conveys relationships among the data. This is an improvement on K-means that requires a value for k prior to execution. There are two main approaches: agglomerative (bottom-up) and divisive (top-down). In practice, agglomerative clustering is more commonly used and supported directly in scikit-learn. This recipe will utilize that approach for hierarchical clustering.
Getting ready
As before, we can use the same dataset we created earlier and simply apply the new technique to it by importing the scikit-learn class and helper functions.
Load the libraries:
from sklearn.cluster import AgglomerativeClustering from scipy.cluster.hierarchy import dendrogram, linkage
Reuse the dataset from the Introduction to Clustering section.
#...