Reader small image

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781803235424
Edition2nd Edition
Right arrow
Author (1)
Serg Masís
Serg Masís
author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís

Right arrow

Summary

After reading this chapter, you should know about two model interpretation methods: feature importance and decision boundaries. You also learned about model interpretation method types and scopes and the three elements that impact interpretability in machine learning. We will keep mentioning these fundamental concepts in subsequent chapters. For a machine learning practitioner, it is paramount to be able to spot them so you can know what tools to leverage to overcome interpretation challenges. In the next chapter, we will dive deeper into this topic.

Further reading

The mission

Picture yourself, a data science consultant, in a conference room in Fort Worth, Texas, during early January 2019. In this conference room, executives for one of the world’s largest airlines, American Airlines (AA), are briefing you on their On-Time Performance (OTP). OTP is a widely accepted Key Performance Indicator (KPI) for flight punctuality. It is measured as the percentage of flights that arrived within 15 minutes of the scheduled arrival. It turns out that AA has achieved an OTP of just over 80% for 3 years in a row, which is acceptable, and a significant improvement, but they are still ninth in the world and fifth in North America. To brag about it next year in their advertising, they aspire to achieve, at least, number one in North America for 2019, besting their biggest rivals.

On the financial front, it is estimated that delays cost the airline close to $2 billion, so reducing this by 25–35% to be on parity with their competitors could produce...

The approach

Upon careful consideration, you have decided to approach this both as a regression problem and a classification problem. Therefore, you will produce models that predict minutes delayed as well as models that classify whether flights were delayed by more than 15 minutes. For interpretation, using both will enable you to use a wider variety of methods and expand your interpretation accordingly. So we will approach this example by taking the following steps:

  1. Predicting minutes delayed with various regression methods
  2. Classifying flights as delayed or not delayed with various classification methods

These steps in the Reviewing traditional model interpretation methods section are followed by conclusions spread out in the rest of the sections of this chapter.

Loading the libraries

To run this example, you need to install the following libraries:

  • mldatasets to load the dataset
  • pandas and numpy to manipulate it
  • sklearn (scikit-learn), rulefit, statsmodels, interpret, tf, and gaminet to fit models and calculate performance metrics
  • matplotlib to create visualizations

Load these libraries as seen in the following snippet:

import math
import mldatasets
import pandas as pd
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler,\
                                  MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics, linear_model, tree, naive_bayes,\
                    neighbors, ensemble, neural_network, svm
from rulefit import RuleFit
import statsmodels.api as sm
from interpret.glassbox import ExplainableBoostingClassifier
from interpret import show
from interpret.perf import ROC
import...

Reviewing traditional model interpretation methods

To explore as many model classes and interpretation methods as possible, we will fit the data into regression and classification models.

Predicting minutes delayed with various regression methods

To compare and contrast regression methods, we will first create a dictionary named reg_models. Each model is its own dictionary and the function that creates it is the model attribute. This structure will be used later to neatly store the fitted model and its metrics. Model classes in this dictionary have been chosen to represent several model families and to illustrate important concepts that we will discuss later:

reg_models = {
    #Generalized Linear Models (GLMs)
    'linear':{'model': linear_model.LinearRegression()}, 
    'linear_poly':{
        'model':make_pipeline(
            PolynomialFeatures(degree=2),
            linear_model.LinearRegression(fit_intercept=False)
       ...

Understanding limitations of traditional model interpretation methods

In a nutshell, traditional interpretation methods only cover high-level questions about your models such as the following:

  • In aggregate, do they perform well?
  • What changes in hyperparameters may impact predictive performance?
  • What latent patterns can you find between the features and their predictive performance?

These questions are very limiting if you are trying to understand not only whether your model works but why and how?

This gap in understanding can lead to unexpected issues with your model that won’t necessarily be immediately apparent. Let’s consider that models, once deployed, are not static but dynamic. They face different challenges than they did in the “lab” when you were training them. They may face not only performance issues but issues with bias, such as imbalance with underrepresented classes, or security vulnerabilities with adversarial...

Studying intrinsically interpretable (white-box) models

So far, in this chapter, we have already fitted our training data to model classes representing each of these “white-box” model families. The purpose of this section is to show you exactly why they are intrinsically interpretable. We’ll do so by employing the models that were previously fitted.

Generalized Linear Models (GLMs)

GLMs are a large family of model classes that have a model for every statistical distribution. Just like linear regression assumes your target feature and residuals have a normal distribution, logistic regression assumes the Bernoulli distribution. There are GLMs for every distribution, such as Poisson regression for Poisson distribution and multinomial response for multinomial distribution. You choose which GLM to use based on the distribution of your target variable and whether your data meets the other assumptions of the GLM (they vary). In addition to an underlying distribution...

Recognizing the trade-off between performance and interpretability

We have briefly touched on this topic before, but high performance often requires complexity, and complexity inhibits interpretability. As studied in Chapter 2, Key Concepts of Interpretability, this complexity comes from primarily three sources: non-linearity, non-monotonicity, and interactivity. If the model adds any complexity, it is compounded by the number and nature of features in your dataset, which by itself is a source of complexity.

Special model properties

These special properties can help make a model more interpretable.

The key property: explainability

In Chapter 1, Interpretation, Interpretability, and Explainability; and Why Does It All Matter?, we discussed why being able to look under the hood of the model and intuitively understand how all its moving parts derive its predictions in a consistent manner is, mostly, what separates explainability from interpretability. This property is...

Discovering newer interpretable (glass-box) models

In the last decade, there have been significant efforts in both industry and in academia to create new models that can have enough complexity to find the sweet spot between underfitting and overfitting, known as the bias-variance trade-off, but retain an adequate level of explainability.

Many models fit this description, but most of them are meant for specific use cases, haven’t been properly tested yet, or have released a library or open-sourced code. However, two general-purpose ones are already gaining traction, which we will look at now.

Explainable Boosting Machine (EBM)

EBM is part of Microsoft’s InterpretML framework, which includes many of the model-agnostic methods we will use later in the book.

EBM leverages the GAMs we mentioned earlier, which are like linear models but look like this:

Individual functions f1 through fp are fitted to each feature using spline functions. Then a link...

Mission accomplished

The mission was to train models that could predict preventable delays with enough accuracy to be useful, and then, to understand the factors that impacted these delays, according to these models, to improve OTP. The resulting regression models all predicted delays, on average, well below the 15-minute threshold according to the RMSE. And most of the classification models achieved an F1 score well above 50% – one of them reached 98.8%! We also managed to find factors that impacted delays for all white-box models, some of which performed reasonably well. So, it seems like it was a resounding success!

Don’t celebrate just yet! Despite the high metrics, this mission was a failure. Through interpretation methods, we realized that the models were accurate mostly for the wrong reasons. This realization helps underpin the mission-critical lesson that a model can easily be right for the wrong reasons, so the question “why?” is not a question...

Summary

After reading this chapter, we covered some traditional methods for interpretability and what their limitations are. We learned about intrinsically interpretable models and how to both use them and interpret them, for both regression and classification. We also studied the performance versus interpretability trade-off and some models that attempt not to compromise in this trade-off. We also discovered many practical interpretation challenges involving the roles of feature selection and engineering, hyperparameters, domain experts, and execution speed.

In the next chapter, we will learn more about different interpretation methods to measure the effect of a feature on a model.

Dataset sources

United States Department of Transportation Bureau of Transportation Statistics. (2018). Airline On-Time Performance Data. Originally retrieved from https://www.transtats.bts.gov.

Further reading

  • Friedman, J. and Popescu, B, 2008, Predictive Learning via Rule Ensembles. The Annals of Applied Statistics, 2(3), 916-954. http://doi.org/10.1214/07-AOAS148
  • Hastie, T., Tibshirani, R., and Wainwright, M., 2015, Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/Crc Monographs on Statistics & Applied Probability, Taylor & Francis
  • Thomas, D.R., Hughes, E., and Zumbo, B.D., 1998, On Variable Importance in Linear Regression. Social Indicators Research 45, 253–275: https://doi.org/10.1023/A:1006954016433
  • Nori, H., Jenkins, S., Koch, P., and Caruana, R., 2019, InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv preprint: https://arxiv.org/pdf/1909.09223.pdf
  • Hastie, T., and Tibshirani, R., 1987, Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82(398):371–386: http://doi.org/10.2307%2F2289439

Learn more...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís