Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Interpretable Machine Learning with Python - Second Edition

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product type Book
Published in Oct 2023
Publisher Packt
ISBN-13 9781803235424
Pages 606 pages
Edition 2nd Edition
Languages
Author (1):
Serg Masís Serg Masís
Profile icon Serg Masís

Table of Contents (17) Chapters

Preface Interpretation, Interpretability, and Explainability; and Why Does It All Matter? Key Concepts of Interpretability Interpretation Challenges Global Model-Agnostic Interpretation Methods Local Model-Agnostic Interpretation Methods Anchors and Counterfactual Explanations Visualizing Convolutional Neural Networks Interpreting NLP Transformers Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis Feature Selection and Engineering for Interpretability Bias Mitigation and Causal Inference Methods Monotonic Constraints and Model Tuning for Interpretability Adversarial Robustness What’s Next for Machine Learning Interpretability? Other Books You May Enjoy
Index

Dataset sources

  • United States Department of Transportation Bureau of Transportation Statistics. (2018). Airline On-Time Performance Data. Originally retrieved from https://www.transtats.bts.gov.

Further reading

  • Friedman, J., & Popescu, B. (2008). Predictive Learning via Rule Ensembles. The Annals of Applied Statistics, 2(3), 916-954. http://doi.org/10.1214/07-AOAS148
  • Hastie, T., R. Tibshirani, and M. Wainwright. 2015. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/Crc Monographs on Statistics & Applied Probability. Taylor & Francis
  • Thomas, D.R., Hughes, E. & Zumbo, B.D. On Variable Importance in Linear Regression. Social Indicators Research 45, 253–275 (1998). https://doi.org/10.1023/A:1006954016433
  • Nori, H., Jenkins, S., Koch, P., & Caruana, R. (2019). InterpretML: A unified framework for machine learning interpretability. arXiv preprint https://arxiv.org/pdf/1909.09223.pdf
  • Hastie, T and Tibshirani, R. Generalized additive models: some applications. Journal of the American Statistical Association, 82(398):371–386, 1987. http://doi.org/10.2307%2F2289439

The mission

The used car market in the United States is a thriving and substantial industry with significant economic impact. In recent years, approximately 40 million used light vehicles have been sold yearly, representing over two-thirds of the overall yearly sales in the automotive sector. In addition, the market has witnessed consistent growth, driven by the rising cost of new vehicles, longer-lasting cars, and an increasing consumer preference for pre-owned vehicles due to the perception of value for money. As a result, this market segment has become increasingly important for businesses and consumers.

Given the market opportunity, a tech startup is currently working on a machine-learning-driven, two-sided marketplace for used car sales. It plans to work much like the e-commerce site eBay, except it’s focused on cars. For example, sellers can list their cars at a fixed price or auction them, and buyers can either pay the higher fixed price or participate in the auction...

The approach

You have decided to take the following steps:

  1. Train a couple of models.
  2. Evaluate them.
  3. Create feature importance values using several methods, both model-specific and model-agnostic.
  4. Plot global summaries, feature summaries, and feature interaction plots to understand how these features relate to the outcome and each other.

The plots will help you communicate findings to the tech startup executives and your data science colleagues.

The preparations

You will find the code for this example here: https://github.com/PacktPublishing/Interpretable-Machine-Learning-with-Python-2E/tree/main/04/UsedCars.ipynb

Loading the libraries

To run this example, you need to install the following libraries:

  • mldatasets to load the dataset
  • pandas and numpy to manipulate it
  • sklearn (scikit-learn) and catboost to load and configure the model
  • matplotlib, seaborn, shap, pdpbox, and pyale to generate and visualize the model interpretations

You should load all of them first:

import math
import os, random
import numpy as np
import pandas as pd
import mldatasets
from sklearn import metrics, ensemble, tree, inspection,\
                    model_selection
import catboost as cb
import matplotlib.pyplot as plt
import seaborn as sns
import shap
from pdpbox import pdp, info_plots
from PyALE import ale
from lime.lime_tabular import LimeTabularExplainer

The following snippet of code will load...

Model training and evaluation

The following code snippet will train two classifiers, CatBoost and Random Forest:

cb_mdl = cb.CatBoostRegressor(
    depth=7, learning_rate=0.2, random_state=rand, verbose=False
)
cb_mdl = cb_mdl.fit(X_train, y_train)
rf_mdl =ensemble.RandomForestRegressor(n_jobs=-1,random_state=rand)
rf_mdl = rf_mdl.fit(X_train.to_numpy(), y_train.to_numpy())

Next, we can evaluate the CatBoost model using a regression plot, and a few metrics. Run the following code, which will output Figure 4.1:

mdl = cb_mdl
y_train_pred, y_test_pred = mldatasets.evaluate_reg_mdl(
    mdl, X_train, X_test, y_train, y_test
)

The CatBoost model produced a high R-squared of 0.94 and a test RMSE of nearly 3,100. The regression plot in Figure 4.1 tells us that although there are quite a few cases that have an extremely high error, the vast majority of the 64,000 test samples were predicted fairly well. You can confirm this by running the following code:

thresh = 4000...

What is feature importance?

Feature importance refers to the extent to which each feature contributes to the final output of a model. For linear models, it’s easier to determine the importance since coefficients clearly indicate the contributions of each feature. However, this isn’t always the case for non-linear models.

To simplify the concept, let’s compare model classes to various team sports. In some sports, it’s easy to identify the players who have the greatest impact on the outcome, while in others, it isn’t. Let’s consider two sports as examples:

  • Relay race: In this sport, each runner covers equal distances, and the race’s outcome largely depends on the speed at which they complete their part. Thus, it’s easy to separate and quantify each racer’s contributions. A relay race is similar to a linear model since the race’s outcome is a linear combination of independent components.
  • Basketball...

Assessing feature importance with model-agnostic methods

Model-agnostic methods imply that we will not depend on intrinsic model parameters to compute feature importance. Instead, we will consider the model as a black box, with only the inputs and output visible. So, how can we determine which inputs made a difference?

What if we altered the inputs randomly? Indeed, one of the most effective methods for evaluating feature importance is through simulations designed to measure a feature’s impact or lack thereof. In other words, let’s remove a random player from the game and observe the outcome! In this section, we will discuss two ways to achieve this: permutation feature importance and SHAP.

Permutation feature importance

Once we have a trained model, we cannot remove a feature to assess the impact of not using it. However, we can:

  • Replace the feature with a static value, such as the mean or median, rendering it devoid of useful information.
  • ...

Visualize global explanations

Previously, we covered the concept of global explanations and SHAP values. But we didn’t demonstrate the many ways we can visualize them. As you will learn, SHAP values are very versatile and can be used to examine much more than feature importance!

But first, we must initialize a SHAP explainer. In the previous chapter, we generated the SHAP values using shap.TreeExplainer and shap.KernelExplainer. This time, we will use SHAP’s newer interface, which simplifies the process by saving SHAP values and corresponding data in a single object and much more! Instead of explicitly defining the type of explainer, you initialize it with shap.Explainer(model), which returns the callable object. Then, you load your test dataset (X_test) into the callable Explainer, and it returns an Explanation object:

cb_explainer = shap.Explainer(cb_mdl)
cb_shap = cb_explainer(X_test)

In case you are wondering, how did it know what kind of explainer to...

Feature summary explanations

This section will cover a number of methods used to visualize how an individual feature impacts the outcome.

Partial dependence plots

Partial Dependence Plots (PDPs) display a feature’s relationship with the outcome according to the model. In essence, the PDP illustrates the marginal effect of a feature on the model’s predicted output across all possible values of that feature.

The calculation involves two steps:

  1. Initially, conduct a simulation where the feature value for each observation is altered to a range of different values, and predict the model using those values. For example, if the year varies between 1984 and 2022, create copies of each observation with year values ranging between these two numbers. Then, run the model using these values. This first step can be plotted as the Individual Conditional Expectation (ICE) plot, with simulated values for year on the X-axis and the model output on the Y-axis, and...

Feature interactions

Features may not influence predictions independently. For example, as discussed in Chapter 2, Key Concepts of Interpretability, determining obesity based solely on weight isn’t possible. A person’s height or body fat, muscle, and other percentages are needed. Models understand data through correlations, and features are often correlated because they are naturally related, even if they are not linearly related. Interactions are what the model may do with correlated features. For instance, a decision tree may put them in the same branch, or a neural network may arrange its parameters in such a way that it creates interaction effects. This also occurs in our case. Let’s explore this through several feature interaction visualizations.

SHAP bar plot with clustering

SHAP comes with a hierarchical clustering method (shap.utils.hclust) that allows for the grouping of training features based on the “redundancy” between any given...

Summary

After reading this chapter, you should understand what model-specific methods to compute feature importance are and their shortcomings. Then, you should have learned how model-agnostic methods’ permutation feature importance and SHAP values are calculated and interpreted. You also learned the most common ways to visualize model explanations. You should know your way around global explanation methods like global summaries, feature summaries, and feature interaction plots and their advantages and disadvantages.

In the next chapter, we will delve into local explanations.

Further reading

lock icon The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023 Publisher: Packt ISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}