Outlier Detection Using Unsupervised Machine Learning
In Chapter 7, you explored parametric and non-parametric statistical techniques to spot potential outliers. These methods are simple and interpretable, yet quite effective.
Outlier detection remains challenging, mainly due to the ambiguity surrounding what constitutes an outlier in your specific data or the problem that you are trying to solve. This ambiguity arises because what is considered an outlier can vary depending on the context, such as the presence of trends, seasonality, or domain-specific patterns. For example, though commonly used, the statistical thresholds from Chapter 7 are somewhat arbitrary rather than definitive rules. This is why domain knowledge or access to Subject Matter Experts (SMEs) is vital to making the proper judgment when identifying time series outliers.
The following table summarizes the outlier detection techniques for different anomaly types that were discussed in Chapter 7: