You're reading from Interpretable Machine Learning with Python - Second Edition

Product type Book

Published in Oct 2023

Publisher Packt

ISBN-13 9781803235424

Pages 606 pages

Edition 2nd Edition

Languages

Concepts

Machine Learning

Author (1):

Serg Masís

Table of Contents (17) Chapters

Preface

Interpretation, Interpretability, and Explainability; and Why Does It All Matter?

Key Concepts of Interpretability

Interpretation Challenges

Global Model-Agnostic Interpretation Methods

Local Model-Agnostic Interpretation Methods

Anchors and Counterfactual Explanations

Visualizing Convolutional Neural Networks

Interpreting NLP Transformers

Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis

Feature Selection and Engineering for Interpretability

Bias Mitigation and Causal Inference Methods

Monotonic Constraints and Model Tuning for Interpretability

Adversarial Robustness

What’s Next for Machine Learning Interpretability?

Other Books You May Enjoy

Index

Adversarial Robustness

Machine learning interpretation has many concerns, ranging from knowledge discovery to high-stakes ones with tangible ethical implications, like the fairness issues examined in the last two chapters. In this chapter, we will direct our attention to concerns involving reliability, safety, and security.

As we realized using the contrastive explanation method in Chapter 7, Visualizing Convolutional Neural Networks, we can easily trick an image classifier into making embarrassingly false predictions. This ability can have serious ramifications. For instance, a perpetrator can place a black sticker on a yield sign, and while most drivers would still recognize this as a yield sign, a self-driving car may no longer recognize it and, as a result, crash. A bank robber could wear a cooling suit designed to trick the bank vault’s thermal imaging system, and while any human would notice it, the imaging system would fail to do so.

The risk is not limited to...

Technical requirements

This chapter’s example uses the mldatasets, numpy, sklearn, tensorflow, keras, adversarial-robustness-toolbox, matplotlib, and seaborn libraries. Instructions on how to install all of these libraries are in the Preface.

The code for this chapter is located here: https://packt.link/1MNrL

The mission

The privately contracted security services industry market worldwide is valued at over USD 250 billion and is growing at around 5% annually. However, it faces many challenges, such as shortages of adequately trained guards and specialized security experts in many jurisdictions, and a whole host of unexpected security threats. These threats include widespread coordinated cybersecurity attacks, massive riots, social upheaval, and, last but not least, health risks brought on by pandemics. Indeed, 2020 tested the industry with a wave of ransomware, misinformation attacks, protests, and COVID-19 to boot.

In the wake of this, one of the largest hospital networks in the United States asked their contracted security company to monitor the correct use of masks by both visitors and personnel throughout the hospitals. The security company has struggled with this request because it diverts security personnel from tackling other threats, such as intruders, combative patients, and...

The approach

You’ve decided to take a four-fold approach:

Exploring several possible evasion attacks to understand how vulnerable the model is to them and how credible they are as threats
Using a preprocessing method to protect a model against these attacks
Leveraging adversarial retraining to produce a robust classifier that is intrinsically less prone to many of these attacks
Evaluating robustness with state-of-the-art methods to assure hospital administrators that the model is adversarially robust

Let’s get started!

The preparations

You will find the code for this example here: https://github.com/PacktPublishing/Interpretable-Machine-Learning-with-Python-2E/tree/main/13/Masks.ipynb

Loading the libraries

To run this example, you need to install the following libraries:

mldatasets to load the dataset
numpy and sklearn (scikit-learn) to manipulate it
tensorflow to fit the models
matplotlib and seaborn to visualize the interpretations

You should load all of them first:

import math
import os
import warnings
warnings.filterwarnings("ignore")
import mldatasets
import numpy as np
from sklearn import preprocessing
import tensorflow as tf
from tensorflow.keras.utils import get_file
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
from art.estimators.classification import KerasClassifier
from art.attacks.evasion import FastGradientMethod,\
                      ProjectedGradientDescent, BasicIterativeMethod
from...

Learning about evasion attacks

There are six broad categories of adversarial attacks:

Evasion: designing an input that can cause a model to make an incorrect prediction, especially when it wouldn’t fool a human observer. It can either be targeted or untargeted, depending on the attacker’s intention to fool the model into misclassifying a specific class (targeted) or, rather, misclassifying any class (untargeted). The attack methods can be white-box if the attacker has full access to the model and its training dataset, or black-box with only inference access. Gray-box sits in the middle. Black-box is always model-agnostic, whereas white and gray-box methods might be.
Poisoning: injecting faulty training data or parameters into a model can come in many forms, depending on the attacker’s capabilities and access. For instance, for systems with user-generated data, the attacker may be capable of adding faulty data or labels. If they have more access...

Defending against targeted attacks with preprocessing

There are five broad categories of adversarial defenses:

Preprocessing: changing the model’s inputs so that they are harder to attack.
Training: training a new robust model that is designed to overcome attacks.
Detection: detecting attacks. For instance, you can train a model to detect adversarial examples.
Transformer: modifying model architecture and training so that it’s more robust – this may include techniques such as distillation, input filters, neuron pruning, and unlearning.
Postprocessing: changing model outputs to overcome production inference or model extraction attacks.

Only the first four defenses work with evasion attacks, and in this chapter, we will only cover the first two: preprocessing and adversarial training. FGSM and C&W can be defended easily with either of these, but an AP is tougher to defend against, so it might require a stronger detection...

Shielding against any evasion attack by adversarial training of a robust classifier

In Chapter 7, Visualizing Convolutional Neural Networks, we identified a garbage image classifier that would likely perform poorly in the intended environment of a municipal recycling plant. The abysmal performance on out-of-sample data was due to the classifier being trained on a large variety of publicly available images that don’t match the expected conditions, or the characteristics of materials that are processed by a recycling plant. The chapter’s conclusion called for training a network with images that represent their intended environment to make for a more robust model.

For model robustness, training data variety is critical, but only if it represents the intended environment. In statistical terms, it’s a question of using samples for training that accurately depict the population so that a model learns to classify them correctly. For adversarial robustness, the same...

Evaluating adversarial robustness

It’s necessary to test your systems in any engineering endeavor to see how vulnerable they are to attacks or accidental failures. However, security is a domain where you must stress-test your solutions to ascertain what level of attacks are needed to make your system break down beyond an acceptable threshold. Furthermore, figuring out what level of defense is needed to curtail an attack is useful information too.

Comparing model robustness with attack strength

We now have two classifiers we can compare against an equally strengthened attack, and we try different attack strengths to see how they fare across all of them. We will use FSGM because it’s fast, but you could use any method!

The first attack strength we can assess is no attack strength. In other words, what is the classification accuracy against the test dataset with no attack? We already had stored the predicted labels for both the base (y_test_pred) and robust...

Mission accomplished

The mission was to perform some adversarial robustness tests on their face mask model to determine if hospital visitors and staff can evade mandatory mask compliance. The base model performed very poorly on many evasion attacks, from the most aggressive to the most subtle.

You also looked at possible defenses to these attacks, such as spatial smoothing and adversarial retraining. And then, you explored ways to evaluate the robustness of your proposed defenses. You can now provide an end-to-end framework to defend against this kind of attack. That being said, what you did was only a proof of concept.

Now, you can propose training a certifiably robust model against the attacks the hospitals expect to encounter the most. But first, you need the ingredients for a generally robust model. To this end, you will need to take all 210,000 images in the original dataset, make many variations on mask colors and types with them, and augment them even further with reasonable...

Summary

After reading this chapter, you should understand how attacks can be perpetrated on machine learning models and evasion attacks in particular. You should know how to perform FSGM, BIM, PGD, C&W, and AP attacks, as well as how to defend against them with spatial smoothing and adversarial training. Last but not least, you know how to evaluate adversarial robustness.

The next chapter is the last one, and it outlines some ideas on what’s next for machine learning interpretation.

Dataset sources

Adnane Cabani, Karim Hammoudi, Halim Benhabiles, and Mahmoud Melkemi, 2020, MaskedFace-Net - A dataset of correctly/incorrectly masked face images in the context of COVID-19, Smart Health, ISSN 2352–6483, Elsevier: https://doi.org/10.1016/j.smhl.2020.100144 (Creative Commons BY-NC-SA 4.0 license by NVIDIA Corporation)
Karras, T., Laine, S., and Aila, T., 2019, A Style-Based Generator Architecture for Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4396–4405: https://arxiv.org/abs/1812.04948 (Creative Commons BY-NC-SA 4.0 license by NVIDIA Corporation)

You're reading from Interpretable Machine Learning with Python - Second Edition

Table of Contents (17) Chapters

Adversarial Robustness

Technical requirements

The mission

The approach

The preparations

Loading the libraries

Learning about evasion attacks

Defending against targeted attacks with preprocessing

Shielding against any evasion attack by adversarial training of a robust classifier

Evaluating adversarial robustness

Comparing model robustness with attack strength

Mission accomplished

Summary

Dataset sources

Further reading

Learn more on Discord

Authors (1)

Personalised recommendations for you

You're reading from Interpretable Machine Learning with Python - Second Edition

Table of Contents (17) Chapters

Adversarial Robustness

Technical requirements

The mission

The approach

The preparations

Loading the libraries

Learning about evasion attacks

Defending against targeted attacks with preprocessing

Shielding against any evasion attack by adversarial training of a robust classifier

Evaluating adversarial robustness

Comparing model robustness with attack strength

Mission accomplished

Summary

Dataset sources

Further reading

Learn more on Discord

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you