Reader small image

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781803235424
Edition2nd Edition
Right arrow
Author (1)
Serg Masís
Serg Masís
author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís

Right arrow

Mission accomplished

The mission was to understand why one of your client's bars is Outstanding while another one is Disappointing. Your approach employed the interpretation of machine learning models to arrive at the following conclusions:

  • According to SHAP on the tabular model, the Outstanding bar owes that rating to its berry taste and its cocoa percentage of 70%. On the other hand, the unfavorable rating for the Disappointing bar is due mostly to its earthy flavor and bean country of origin (Other). Review date plays a smaller role, but it seems that chocolate bars reviewed in that period (2013-15) were at an advantage.
  • LIME confirms that cocoa_percent<=70 is a desirable property, and that, in addition to berry, creamy, cocoa, and rich are favorable tastes, while sweet, sour, and molasses are unfavorable.
  • The commonality between both methods using the tabular model is that despite the many non-taste-related attributes, taste features are among the most salient. Therefore...

Summary

After reading this chapter, you should know how to use SHAP's KernelExplainer, as well as its decision and force plot to conduct local interpretations. You also should know how to do the same with LIME's instance explainer for both tabular and text data. Lastly, you should understand the strengths and weaknesses of SHAP's KernelExplainer and LIME. In the next chapter, we will learn how to create even more human-interpretable explanations of a model's decisions, such as "if X conditions are met, then Y is the outcome".

Dataset sources

Further reading

  • Platt, J. C. (1999). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press. https://www.cs.colorado.edu/~mozer/Teaching/syllabi/6622/papers/Platt1999.pdf
  • Lundberg, S. & Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1705.07874 (documentation for SHAP: https://github.com/slundberg/shap)
  • Ribeiro, M. T., Singh, S. & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http://arxiv.org/abs/1602.04938
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. & Liu, T. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems vol. 30, pp. 3149-3157. https...

The preparations

Loading the libraries

To run this example, you need to install the following libraries:

  • mldatasets to load the dataset
  • pandas, numpy, and nltk to manipulate it
  • sklearn (scikit-learn) and lightgbm to split the data and fit the models
  • matplotlib, seaborn, shap, and lime to visualize the interpretations

You should load all of them first, as follows:

import math
import mldatasets
import pandas as pd
import numpy as np
import re
import nltk
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn import metrics, svm
from sklearn.feature_extraction.text import TfidfVectorizer
import lightgbm as lgb
import matplotlib.pyplot as...

Leveraging SHAP’s KernelExplainer for local interpretations with SHAP values

For this section, and for subsequent use, we will train a Support Vector Classifier (SVC) model first.

Training a C-SVC model

SVM is a family of model classes that operate in high-dimensional space to find an optimal hyperplane, where they attempt to separate the classes with the maximum margin between them. Support vectors are the points closest to the decision boundary (the dividing hyperplane) that would change it if were removed. To find the best hyperplane, they use a cost function called hinge loss and a computationally cheap method to operate in high-dimensional space, called the kernel trick, and even though a hyperplane suggests linear separability, it’s not always limited to a linear kernel.

The scikit-learn implementation we will use is called C-SVC. SVC uses an L2 regularization parameter called C and, by default, uses a kernel called the Radial Basis Function (RBF),...

Employing LIME

Until now, the model-agnostic interpretation methods we’ve covered attempt to reconcile the totality of outputs of a model with its inputs. For these methods to get a good idea of how and why X becomes y_pred, we need some data first. Then, we perform simulations with this data, pushing variations of it into a model and evaluating what comes out of the model. Sometimes, they even leverage a global surrogate to connect the dots. By using what we learned in this process, we yield feature importance values that quantify a feature’s impact, interactions, or decisions on a global level. For many methods such as SHAP, these can be observed locally too. However, even when they can be observed locally, what was quantified globally may not apply locally. For this reason, there should be another approach that quantifies the local effects of features solely for local interpretation—one such as LIME!

What is LIME?

LIME trains local surrogates to explain...

Using LIME for NLP

At the beginning of the chapter, we set aside training and test datasets with the cleaned-up contents of all the “tastes” columns for NLP. We can take a peek at the test dataset for NLP, as follows:

print(X_test_nlp)

This outputs the following:

1194                 roasty nutty rich
77      roasty oddly sweet marshmallow
121              balanced cherry choco
411                sweet floral yogurt
1259           creamy burnt nuts woody
                     ...              
327          sweet mild molasses bland
1832          intense fruity mild sour
464              roasty sour milk note
2013           nutty fruit sour floral
1190           rich roasty nutty smoke
Length: 734, dtype: object

No machine learning model can ingest the data as text, so we need to turn it into a numerical format—in other words, vectorize it. There are many techniques we can use to do this. In our case, we are not interested in the position of words...

Trying SHAP for NLP

Most of SHAP’s explainers will work with tabular data. DeepExplainer can do text but is restricted to deep learning models, and, as we will cover in Chapter 7, Visualizing Convolutional Neural Networks, three of them do images, including KernelExplainer. In fact, SHAP’s KernelExplainer was designed to be a general-purpose, truly model-agnostic method, but it’s not promoted as an option for NLP. It is easy to understand why: it’s slow, and NLP models tend to be very complex and with hundreds—if not thousands—of features to boot. In cases such as this one, where word order is not a factor and you have a few hundred features, but the top 100 are present in most of your observations, KernelExplainer could work.

In addition to overcoming the high computation cost, there are a couple of technical hurdles you would need to overcome. One of them is that KernelExplainer is compatible with a pipeline, but it expects a single...

Comparing SHAP with LIME

As you will have noticed by now, both SHAP and LIME have limitations, but they also have strengths. SHAP is grounded in game theory and approximate Shapley values, so its SHAP values are supported by theory. These have great properties such as additivity, efficiency, and substitutability that make them consistent but violate the dummy property. It always adds up and doesn’t need parameter tuning to accomplish this. However, it’s more suited for global interpretations, and one of its most model-agnostic explainers, KernelExplainer, is painfully slow. KernelExplainer also deals with missing values by using random ones, which can put too much weight on unlikely observations.

LIME is speedy, very model-agnostic, and adaptable to all kinds of data. However, it’s not grounded on strict and consistent principles but has the intuition that neighbors are alike. Because of this, it can require tricky parameter tuning to define the neighborhood...

Mission accomplished

The mission was to understand why one of your client’s bars is Outstanding while another one is Disappointing. Your approach employed the interpretation of machine learning models to arrive at the following conclusions:

  • According to SHAP on the tabular model, the Outstanding bar owes that rating to its berry taste and its cocoa percentage of 70%. On the other hand, the unfavorable rating for the Disappointing bar is due mostly to its earthy flavor and bean country of origin (Other). Review date plays a smaller role, but it seems that chocolate bars reviewed in that period (2013–15) were at an advantage.
  • LIME confirms that cocoa_percent<=70 is a desirable property, and that, in addition to berry, creamy, cocoa, and rich are favorable tastes, while sweet, sour, and molasses are unfavorable.
  • The commonality between both methods using the tabular model is that despite the many non-taste-related attributes, taste features are...

Summary

In this chapter, we learned how to use SHAP’s KernelExplainer, as well as its decision and force plot to conduct local interpretations. We carried out a similar analysis using LIME’s instance explainer for both tabular and text data. Lastly, we looked at the strengths and weaknesses of SHAP’s KernelExplainer and LIME. In the next chapter, we will learn how to create even more human-interpretable explanations of a model’s decisions, such as if X conditions are met, then Y is the outcome.

Dataset sources

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís