Introduction to Distance Metrics
Distance metrics are fundamental components in many ML algorithms, particularly those that rely on the concept of similarity or dissimilarity between data points. Understanding how to measure the distance between points in a feature space is central for tasks such as clustering, classification, and regression. In this chapter, we will explore various distance metrics, including Euclidean, Manhattan, and Minkowski distances. We will also provide hands-on examples of calculating these metrics using scikit-learn along with the algorithms that utilize them.
Important note: What is “distance” in ML?
In ML, we are often interested in understanding how similar (or dissimilar) different data points are in our dataset. This is especially true in classification problems where we may make the assumption that “if it looks like a duck, walks like a duck, and quacks like a duck,” well then it’s probably a duck…and not, say...