K-means Clustering
K-means is a centroid-based clustering algorithm that partitions data into a predefined number of clusters, which is perfect considering our data is quite blobby from the Introduction to Clustering section. First, K-means randomly creates centroids in our feature space. Next, it iteratively assigns each data point to the nearest cluster centroid and then recalculates the cluster centroids and moves them in the feature space so that they are positioned approximately within the average distance among the data points current assigned to them in the current iteration. This process continues until convergence where the centroids don’t move much and data points are not being reassigned to other cluster centroid. K-means is efficient and works best when clusters are convex, isotropic, and roughly equal in size…which also can be its greatest weakness. This recipe will walk you through this process.
Getting ready
Here, we’ll use the previous dummy data...