Scaling Techniques
When working with datasets, features can have vastly different scales. For instance, a feature representing age may range from 0 to 100, while another feature representing income could range from 0 to 100,000. Many ML algorithms, such as k-nearest neighbors (KNN) and gradient descent-based methods (e.g., linear regression), are sensitive to these differences in scale. Therefore, scaling helps ensure that no single feature dominates the learning process. It is worth noting that sometimes these two terms below are used interchangeably, but they are not the same and should not be implemented as such!
Key Concepts
- Standardization (Z-score Transformation) changes the data to have a mean of zero and a standard deviation of one
- Normalization changes the range of the data distribution so values fall between zero and one
Getting ready
We will use the previously defined `iterative_imputed_df
` DataFrame for this recipe so no need to redefine it.