K-means clustering
In this recipe, we’ll learn about another data science technique called clustering and revisit our breast cancer dataset.
K-means clustering is an example of an unsupervised algorithm. In these types of algorithms, we need a training dataset so that the algorithm is able to learn. After training the algorithm, it will be able to predict a certain outcome for new samples. In our case, we are hoping that we can predict the main classes in the population.
K-means comes from the idea of creating K centers. Points are assigned to centroids based on their Euclidean distance from the center. We then adjust the centers until we have more and more data points falling nearby with minimal distance. In this way, we can attempt to classify our data into approximately K groups.
Getting ready
We will be using the same data as in the previous recipe. The code for this recipe can be found in Ch04/Ch04-3-k-means.ipynb.
How to do it...
Here are the steps to...