Chapter 7. Clustering with Mahout
In this chapter, we will discuss one of the major application areas of machine learning. Cluster analysis has wide areas of application like customer segmentation, news grouping, grouping users based on their behavior, and so on.
We will also get an understanding of the internals of a few important clustering algorithms and then discuss their implementation in Mahout. The topics that we will discuss in this chapter are as follows:
Data preprocessing
k-means
Canopy clustering
Fuzzy k-means
Streaming k-means