and third quartiles, known as the Inter-Quartile Range(IQR). The first quartile is the value below which 25% of the observations lie (equivalent to the 25th percentile), while the third quartile is the value below which 75% of the observations lie (equivalent to the 75th percentile). The IQR is calculatedas follows:
IQR = 3rd quartile - 1st quartile
Boxplots also display whiskers, which are lines that protrude from each end of the box toward the minimum and maximum values and up to a limit. These limits are given by the minimum or maximum value of the distribution or, in the presence of extreme values, by thefollowing equations:
upper limit = 3rd quartile + IQR × 1.5
lower limit = 1st quartile - IQR × 1.5
According to theIQR proximity rule, we can consider a value an outlier if it falls beyond the whisker limits determined by the previous equations. In boxplots, outliers are indicatedas dots.
Note
If the variable has a normal distribution, about 99% of the observations will be located within the interval delimited by the whiskers. Hence, we can treat values beyond the whiskers as outliers. Boxplots are, however, non-parametric, which is why we also use them to visualize outliers inskewed variables.
In this recipe, we’ll begin by visualizing the variable distribution with boxplots, and then we’ll calculate the whisker’s limits manually to identify the points beyond which we could consider a value asan outlier.