Detecting outliers using a modified z-score
In the previous recipe, you explored the z-score method, which is a simple and intuitive method for identifying outliers. However, it comes with a major drawback: it assumes the data follows a normal distribution. Real-world data often deviates from normality, making the regular z-score less effective.Fortunately, to address this, there is a modified version of the z-score designed to work with non-normal data. The main difference between the regular z-score and the modified z-score is that we replace the mean and standard deviation with the median and median absolute deviation (MAD):Here, (tilde x) is the median of the dataset, and MAD is the median absolute deviation of the dataset:The constant 0.6745 is a normalization factor that corresponds to the 75th percentile (Q3) in a Gaussian distribution. It helps to approximate the standard deviation, allowing the modified z-score to be interpreted in standard deviation units, similar to how we...