Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Interpretable Machine Learning with Python - Second Edition

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product type Book
Published in Oct 2023
Publisher Packt
ISBN-13 9781803235424
Pages 606 pages
Edition 2nd Edition
Languages
Author (1):
Serg Masís Serg Masís
Profile icon Serg Masís

Table of Contents (17) Chapters

Preface Interpretation, Interpretability, and Explainability; and Why Does It All Matter? Key Concepts of Interpretability Interpretation Challenges Global Model-Agnostic Interpretation Methods Local Model-Agnostic Interpretation Methods Anchors and Counterfactual Explanations Visualizing Convolutional Neural Networks Interpreting NLP Transformers Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis Feature Selection and Engineering for Interpretability Bias Mitigation and Causal Inference Methods Monotonic Constraints and Model Tuning for Interpretability Adversarial Robustness What’s Next for Machine Learning Interpretability? Other Books You May Enjoy
Index

Bias Mitigation and Causal Inference Methods

In Chapter 6, Anchors and Counterfactual Explanations, we examined fairness and its connection to decision-making but limited to post hoc model interpretation methods. In Chapter 10, Feature Selection and Engineering for Interpretability, we broached the topic of cost-sensitivity, which often relates to balance or fairness. In this chapter, we will engage with methods that will balance data and tune models for fairness.

With a credit card default dataset, we will learn how to leverage target visualizers such as class balance to detect undesired bias, then how to reduce it via preprocessing methods such as reweighting and disparate impact remover for in-processing and equalized odds for post-processing. Extending from the topics of Chapter 6, Anchors and Counterfactual Explanations, and Chapter 10, Feature Selection and Engineering for Interpretability, we will also study how policy decisions can have unexpected, counterintuitive, or...

Technical requirements

This chapter’s example uses the mldatasets, pandas, numpy, sklearn, lightgbm, xgboost, matplotlib, seaborn, xai, aif360, econml, and dowhy libraries. Instructions on how to install all these libraries are in the preface.

The code for this chapter is located here:

https://packt.link/xe6ie

The mission

Over 2.8 billion credit cards are circulating worldwide, and we collectively spend over $25 trillion (US) on them every year (https://www.ft.com/content/ad826e32-2ee8-11e9-ba00-0251022932c8). This is an astronomical amount, no doubt, but the credit card industry’s size is best measured not by what is spent, but by what is owed. Card issuers such as banks make the bulk of their money from interest. So, the over $60 trillion owed by consumers (2022), of which credit card debt is a sizable portion, provides a steady income to lenders in the form of interest. It could be argued this is good for business, but it also poses ample risk because if a borrower defaults before the principal plus operation costs have been repaid, the lender could lose money, especially once they’ve exhausted legal avenues to collect the debt.

When there’s a credit bubble, this problem is compounded because an unhealthy level of debt can compromise lenders’ finances...

The approach

The bank has stressed to you how important it is that there’s fairness embedded in your methods because the regulators and the public at large want assurance that banks will not cause any more harm. Their reputation depends on it too, because in recent months, the media has been relentless in blaming them for dishonest and predatory lending practices, causing distrust in consumers. For this reason, they want to use state-of-the-art robustness testing to demonstrate that the prescribed policies will alleviate the problem. Your proposed approach includes the following points:

  • Younger lenders have been reported to be more prone to defaulting on repayment, so you expect to find age bias, but you will also look for bias with other protected groups such as gender.
  • Once you have detected bias, you can mitigate bias with preprocessing, in-processing, and post-processing algorithms using the AI Fairness 360 (AIF360) library. In this process, you will train...

The preparations

You will find the code for this example here: https://github.com/PacktPublishing/Interpretable-Machine-Learning-with-Python/blob/master/Chapter11/CreditCardDefaults.ipynb.

Loading the libraries

To run this example, you need to install the following libraries:

  • mldatasets to load the dataset
  • pandas and numpy to manipulate it
  • sklearn (scikit-learn), xgboost, aif360, and lightgbm to split the data and fit the models
  • matplotlib, seaborn, and xai to visualize the interpretations
  • econml and dowhy for causal inference

You should load all of them first, as follows:

import math
import os
import mldatasets
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from sklearn import model_selection, tree
import lightgbm as lgb
import xgboost as xgb
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric,\
                           ClassificationMetric
from aif360.algorithms...

Detecting bias

There are many sources of bias in machine learning. As outlined in Chapter 1, Interpretation, Interpretability, and Explainability; and Why Does It All Matter?, there are ample sources of bias. Those rooted in the truths that the data represents, such as systemic and structural ones, lead to prejudice bias in the data. There are also biases rooted in the data, such as sample, exclusion, association, and measurement biases. Lastly, there are biases in the insights we derive from data or models we have to be careful with, such as conservatism bias, salience bias, and fundamental attribution error.

For this example, to properly disentangle so many bias levels, we ought to connect our data to census data for Taiwan in 2005 and historical lending data split by demographics. Then, using these external datasets, control for credit card contract conditions, as well as gender, income, and other demographic data to ascertain if young people, in particular, were targeted for...

Mitigating bias

We can mitigate bias at three different levels with methods that operate at these individual levels:

  • Preprocessing: These are interventions to detect and remove bias from the training data before training the model. Methods that leverage pre-processing have the advantage that they tackle bias at the source. On the other hand, any undetected bias could still be amplified by the model.
  • In-processing: These methods mitigate bias during the model training and are, therefore, highly dependent on the model and tend to not be model-agnostic like the pre-processing and post-processing methods. They also require hyperparameter tuning to calibrate fairness metrics.
  • Post-processing: These methods mitigate bias during model inference. In Chapter 6, Anchors and Counterfactual Explanations, we touched on the subject of using the What-If tool to choose the right thresholds (see Figure 6.13 in that chapter), and we manually adjusted them to achieve parity with...

Creating a causal model

Decision-making will often involve understanding cause and effect. If the effect is desirable, you can decide to replicate its cause, or otherwise avoid it. You can change something on purpose to observe how it changes outcomes, or trace an accidental effect back to its cause, or simulate which change will produce the most beneficial impact. Causal inference can help us do all this by creating causal graphs and models. These tie all variables together and estimate effects to make more principled decisions. However, to properly assess the impact of a cause, whether by design or accident, you’ll need to separate its effect from confounding variables.

The reason causal inference is relevant to this chapter is that the bank’s policy decisions have the power to impact cardholder livelihoods significantly and, given the rise in suicides, even life and death. Therefore, there’s a moral imperative to assess policy decisions with the utmost...

Understanding heterogeneous treatment effects

Firstly, it’s important to note how the dowhy wrapper of econml has cut down on a few steps with the dowhy.fit method. Usually, when you build a CausalModel such as this one directly with dowhy, it has a method called identify_effect that derives the probability expression for the effect to be estimated (the identified estimand). In this case, this is called the Average Treatment Effect (ATE). Then, another method called estimate_effect takes this expression and the models it’s supposed to tie together (regression and propensity). With them, it computes both the ATE, , and CATE, , for every outcome i and treatment t. However, since we used the wrapper to fit the causal model, it automatically takes care of both the identification and estimation steps.

You can access the identified ATE with the identified_estimand_ property and the estimate results with the estimate_ property for the causal model. The code can be seen...

Testing estimate robustness

The dowhy library comes with four methods to test the robustness of the estimated causal effect, outlined as follows:

  • Random common cause: Adding a randomly generated confounder. If the estimate is robust, the ATE should not change too much.
  • Placebo treatment refuter: Replacing treatments with random variables (placebos). If the estimate is robust, the ATE should be close to zero.
  • Data subset refuter: Removing a random subset of the data. If the estimator generalizes well, the ATE should not change too much.
  • Add unobserved common cause: Adding an unobserved confounder that is associated with both the treatment and outcome. The estimator assumes some level of unconfoundedness but adding more should bias the estimates. Depending on the strength of the confounder’s effect, it should have an equal impact on the ATE.

We will test robustness with the first two next.

Adding a random common cause

This method...

Mission accomplished

The mission of this chapter was twofold, as outlined here:

  • Create a fair predictive model to predict which customers are most likely to default.
  • Create a robust causal model to estimate which policies are most beneficial to customers and the bank.

Regarding the first goal, we have produced four models with bias mitigation methods that are objectively fairer than the base model, according to four fairness metrics (SPD, DI, AOD, EOD)—when comparing privileged and underprivileged age groups. However, only two of these models are intersectionally fairer using both age group and gender, according to DFBA (see Figure 11.7). We can still improve fairness significantly by combining methods, yet any one of the four models improves the base model.

As for the second goal, the causal inference framework determined that any of the policies tested is better than no policy for both parties. Hooray! However, it yielded estimates that didn...

Summary

After reading this chapter, you should understand how bias can be detected visually and with metrics, both in data and models, then mitigated through preprocessing, in-processing, and post-processing methods. We also learned about causal inference by estimating heterogeneous treatment effects, making fair policy decisions with them, and testing their robustness. In the next chapter, we also discuss bias but learn how to tune models to meet several objectives, including fairness.

Dataset sources

Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480: https://dl.acm.org/doi/abs/10.1016/j.eswa.2007.12.020

Further reading

  • Chang, C., Chang, H.H., and Tien, J., 2017, A Study on the Coping Strategy of Financial Supervisory Organization under Information Asymmetry: Case Study of Taiwan’s Credit Card Market. Universal Journal of Management, 5, 429-436: http://doi.org/10.13189/ujm.2017.050903
  • Foulds, J., and Pan, S., 2020, An Intersectional Definition of Fairness. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 1918-1921: https://arxiv.org/abs/1807.08362
  • Kamiran, F., and Calders, T., 2011, Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33, 1-33: https://link.springer.com/article/10.1007/s10115-011-0463-8
  • Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., and Venkatasubramanian, S., 2015, Certifying and Removing DI. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: https://arxiv.org/abs/1412.3756
  • Kamishima, T., Akaho...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023 Publisher: Packt ISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}