k-NEAREST NEIGHBORS
Our next supervised learning classification technique is k-nearest neighbors, which classifies new unknown data points based on their proximity to known data points. This process of classification is determined by setting “k” number of data points closest to the target data point. If we set k to 3, for example, k-NN analyzes the nearest three data points (neighbors) to the target data point.
Figure 29: An example of k-NN clustering used to predict the class of a new data point
The k-nearest neighbors technique is sometimes referred to as a “memory-based procedure” because the full training data is used each time a prediction is made.18 For this reason, k-NN is generally not recommended for analyzing large datasets and measuring multiple distances in high-dimensional data. Reducing the number of dimensions, through the use of a descending dimension algorithm such as principal component analysis (PCA) or...