Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Interpretable Machine Learning with Python - Second Edition

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product type Book
Published in Oct 2023
Publisher Packt
ISBN-13 9781803235424
Pages 606 pages
Edition 2nd Edition
Languages
Author (1):
Serg Masís Serg Masís
Profile icon Serg Masís

Table of Contents (17) Chapters

Preface Interpretation, Interpretability, and Explainability; and Why Does It All Matter? Key Concepts of Interpretability Interpretation Challenges Global Model-Agnostic Interpretation Methods Local Model-Agnostic Interpretation Methods Anchors and Counterfactual Explanations Visualizing Convolutional Neural Networks Interpreting NLP Transformers Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis Feature Selection and Engineering for Interpretability Bias Mitigation and Causal Inference Methods Monotonic Constraints and Model Tuning for Interpretability Adversarial Robustness What’s Next for Machine Learning Interpretability? Other Books You May Enjoy
Index

The preparations

You will find the code for this example here: https://github.com/PacktPublishing/Interpretable-Machine-Learning-with-Python/tree/master/Chapter10/Mailer.ipynb.

Loading the libraries

To run this example, you need to install the following libraries:

  • mldatasets to load the dataset
  • pandas, numpy, and scipy to manipulate it
  • mlxtend, sklearn_genetic, xgboost, and sklearn (scikit-learn) to fit the models
  • matplotlib and seaborn to create and visualize the interpretations

To load the libraries, use the following code block:

import math
import os
import mldatasets
import pandas as pd
import numpy as np
import timeit
from tqdm.notebook import tqdm
from sklearn.feature_selection import VarianceThreshold,\
                                    mutual_info_classif, SelectKBest
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression,\
                                     LassoCV, LassoLarsCV, LassoLarsIC
from mlxtend.feature_selection...

Understanding the effect of irrelevant features

Feature selection is also known as variable or attribute selection. It is the method by which you can automatically or manually select a subset of specific features useful to the construction of ML models.

It's not necessarily true that more features lead to better models. Irrelevant features can impact the learning process, leading to overfitting. Therefore, we need some strategies to remove any features that might adversely affect learning. Some of the advantages of selecting a smaller subset of features include the following:

  • It's easier to understand simpler models: For instance, feature importance for a model that uses 15 variables is much easier to grasp than one that uses 150 variables.
  • Shorter training time: Reducing the number of variables decreases the cost of computing, speeds up model training, and perhaps most notably, simpler models have quicker inference times.
  • Improved generalization by reducing overfitting: Sometimes...

Reviewing filter-based feature selection methods

Filter-based methods independently pick out features from a dataset without employing any ML. These methods depend only on the variables' characteristics and are relatively effective, computationally inexpensive, and quick to perform. Therefore, being the low-hanging fruit of feature selection methods, they are usually the first step in any feature selection pipeline.

Two kinds of filter-based methods exist:

  • Univariate: Individually and independently of the feature space, they evaluate and rate a single feature at a time. One problem that can occur with univariate methods is that they may filter out too much since they don't take into consideration the relationship between features.
  • Multivariate: These take into account the entire feature space and how features within interact with each other.

Overall, for the removal of obsolete, redundant, constant, duplicated, and uncorrelated features, filter methods are very strong. However...

Exploring embedded feature selection methods

Embedded methods exist within models themselves by naturally selecting features during training. You can leverage the intrinsic properties of any model that has them to capture the features selected:

  • Tree-based models: For instance, we have used the following code many times to count the number of features used by the RF models, which is evidence of feature selection naturally occurring in the learning process:
              sum(reg_mdls[mdlname]['fitted'].feature_importances_ > 0)

XGBoost's RF uses gain by default, which is the average decrease in error in all splits where it used the feature to compute feature importance. We can increase the threshold above 0 to select even fewer features according to this relative contribution. However, by constraining the trees' depth, we forced the model to choose even fewer features already.

  • Regularized models with coefficients: We will study this further in Chapter 12, Monotonic...

Discovering wrapper, hybrid, and advanced feature selection methods

The feature selection methods studied so far are computationally inexpensive because they require no model fitting or fitting simpler white-box models. In this section, we will learn about other, more exhaustive methods with many possible tuning options. The categories of methods included here are as follows:

  • Wrapper: Exhaustively look for the best subset of features by fitting an ML model using a search strategy that measures improvement on a metric.
  • Hybrid: A method that combines embedded and filter methods with wrapper methods.
  • Advanced: A method that doesn't fall into any of the previously discussed categories. Examples include dimensionality reduction, model-agnostic feature importance, and GAs.

And now, let's get started with wrapper methods!

Wrapper methods

The concept behind wrapper methods is reasonably simple: evaluate different subsets of features on the ML model and choose the one that achieves...

Hybrid methods

Starting with 435 features, there are over 1042 combinations of 27 feature subsets alone! So, you can see how EFS would be impractical on such a large feature space. Therefore, except for EFS on the entire dataset, wrapper methods will invariably take some shortcuts to select the features. Whether you are going forward, backward, or both, as long as you are not assessing every single combination of features, you could easily miss out on the best one.

However, we can leverage the more rigorous, exhaustive search approach of wrapper methods with filter and embedded methods' efficiency. The result of this is hybrid methods. For instance, you could employ filter or embedded methods to derive only the top-10 features and perform EFS or SBS on only those.

Recursive feature elimination

Another, more common approach is something such as SBS, but instead of removing features based on improving a metric alone, using the model's intrinsic parameters to rank the features...

Considering feature engineering

Let's assume that the non-profit has chosen to use the model whose features were selected with Lasso LARS with AIC (e-llarsic) but would like to evaluate whether you can improve it further. Now that you have removed over 300 features that might have only marginally improved predictive performance but mostly added noise, you are left with more relevant features. However, you also know that 8 features selected by e-llars produced the same amount of RMSE as the 111 features. This means that while there's something in those extra features that improves profitability, it does not improve the RMSE.

From a feature selection standpoint, many things can be done to approach this problem. For instance, examine the overlap and difference of features between e-llarsic and e-llars, and do feature selection variations strictly on those features to see whether the RMSE dips on any combination while keeping or improving on current profitability. However, there...

Mission accomplished

To approach this mission, you have reduced overfitting using primarily the toolset of feature selection. The non-profit is pleased with a profit lift of roughly 30%, costing a total of $35,601, which is $30,000 less than it would cost to send everyone in the test dataset the mailer. However, they still want assurance that they can safely employ this model without worries that they'll experience losses.

In this chapter, we've examined how overfitting can cause the profitability curves not to align. Misalignment is critical because it could mean that choosing a threshold based on training data would not be reliable on out-of-sample data. So, you use compare_df_plots to compare profitability between the test and train sets as you've done before, but this time for the chosen model (rf_5_e-llarsic):

profits_test = reg_mdls['rf_5_e-llarsic']['profits_test']
profits_train = reg_mdls['rf_5_e-llarsic']['profits_train&apos...

Summary

In this chapter, we have learned about how irrelevant features impact model outcomes and how feature selection provides a toolset to solve this problem. We then explored many different methods in this toolset, from the most basic filter methods to the most advanced ones. Lastly, we broached the subject of feature engineering for interpretability. Feature engineering can make for a more interpretable model that will perform better. We will cover this topic in more detail in Chapter 12, Monotonic Constraints and Model Tuning for Interpretability. In the next chapter, we will discuss methods for bias mitigation and causal inference.

Dataset sources

Further reading

lock icon The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023 Publisher: Packt ISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}