Reader small image

You're reading from  Interpretable Machine Learning with Python - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781803235424
Edition2nd Edition
Right arrow
Author (1)
Serg Masís
Serg Masís
author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís

Right arrow

Interpretation Methods for Multivariate Forecasting and Sensitivity Analysis

Throughout this book, we have learned about various methods we can use to interpret supervised learning models. They can be quite effective at assessing models while also uncovering their most influential predictors and their hidden interactions. But as the term supervised learning suggests, these methods can only leverage known samples and permutations based on these known samples’ distributions. However, when these samples represent the past, things can get tricky! As the Nobel laureate in physics Niels Bohr famously quipped, “Prediction is very difficult, especially if it’s about the future.”

Indeed, when you see data points fluctuating in a time series, they may appear to be rhythmically dancing in a predictable pattern – at least in the best-case scenarios. Like a dancer moving to a beat, every repetitive movement (or frequency) can be attributed to seasonal patterns...

Join our book community on Discord

https://packt.link/EarlyAccessCommunity

Over the last thirteen chapters, we have explored the field of Machine Learning (ML) interpretability. As stated in the preface, it's a broad area of research, most of which hasn't even left the lab and become widely used yet, and this book has no intention of covering absolutely all of it. Instead, the objective is to present various interpretability tools in sufficient depth to be useful as a starting point for beginners and even complement the knowledge of more advanced readers. This chapter will summarize what we've learned in the context of the ecosystem of ML interpretability methods, and then speculate on what's to come next!

These are the main topics we are going to cover in this chapter:

  • Understanding the current landscape of ML interpretability
  • Speculating on the future of ML interpretability

Understanding the current landscape of ML interpretability

First, we will provide some context on how the book relates to the main goals of ML interpretability and how practitioners can start applying the methods to achieve those broad goals. Then, we'll discuss what the current areas of growth in research are.

Tying everything together!

As discussed in Chapter 1, Interpretation, Interpretability, and Explainability; and Why Does It All Matter?, there are three main themes when talking about ML interpretability: Fairness, Accountability, and Transparency (FAT), and each of these presents a series of concerns (see Figure 14.1). I think we can all agree these are all desirable properties for a model! Indeed, these concerns all present opportunities for the improvement of Artificial Intelligence (AI) systems. These improvements start by leveraging model interpretation methods to evaluate models, confirm or dispute assumptions, and find problems.

What your aim is will depend on what...

Speculating on the future of ML interpretability

I'm used to hearing the metaphor of this period being the "Wild West of AI", or worse, an "AI Gold Rush"! It conjures images of unexplored and untamed territory being eagerly conquered, or worse, civilized. Yet, in the 19th century, the United States' western areas were not too different from other regions on the planet and had already been inhabited by Native Americans for millennia, so the metaphor doesn't quite work. Predicting with the accuracy and confidence that we can achieve with ML would spook our ancestors and is not a "natural" position for us humans. It's more akin to flying than exploring unknown land.

The article Toward the Jet Age of machine learning (linked in the Further reading section at the end of this chapter) presents a much more fitting metaphor of AI being like the dawn of aviation. It's new and exciting, and people still marvel at what we can do from down below...

Further reading

Assessing time series models with traditional interpretation methods

A time series regressor model can be evaluated as you would evaluate any regression model; that is, using metrics derived from the mean squared error or the R-squared score. There are, of course, cases in which you will need to use a metric with medians, logs, deviances, or absolute values. These models don’t require any of this.

Using standard regression metrics

The evaluate_reg_mdl function can evaluate the model, output some standard regression metrics, and plot them. The parameters for this model are the fitted model (lstm_traffic_mdl), X_train (gen_train), X_test (gen_test), y_train, and y_test.

Optionally, we can specify a y_scaler so that the model is evaluated with the labels’ inverse transformed, which makes the plot and root mean square error (RMSE) much easier to interpret. Another optional parameter that is very much necessary, in this case, is y_truncate=True because our y_train...

Generating LSTM attributions with integrated gradients

We first learned about integrated gradients (IG) in Chapter 7, Visualizing Convolutional Neural Networks. Unlike the other gradient-based attribution methods studied in that chapter, path-integrated gradients is not contingent on convolutional layers, nor is it limited to classification problems.

In fact, since it computes the gradients of the output concerning the inputs averaged along the path, the input and output could be anything! It is common to use integrated gradients with Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), like the one we are interpreting in this chapter. Frankly, when you see an IG LSTM example online, it has an embedding layer and is an NLP classifier, but IG could be used very effectively for LSTMs that even process sounds or genetic data!

The integrated gradient explainer and the explainers that we will use moving forward can access any part of the traffic dataset....

Computing global and local attributions with SHAP’s KernelExplainer

Permutation methods make changes to the input to assess how much difference they will make to a model’s output. We first discussed this in Chapter 4, Global Model-Agnostic interpretation methods, but if you recall, there’s a coalitional framework to perform these permutations that will produce the average marginal contribution for each feature across different coalitions of features. This process’s outcome is Shapley values, which have essential mathematical properties such as additivity and symmetry. Unfortunately, Shapley values are costly to compute for datasets that aren’t small, so the SHAP library has approximation methods. One of these methods is KernelExplainer, which we also explained in Chapter 4 and used in Chapter 5, Local Model-Agnostic Interpretation Methods. It approximates the Shapley values with a weighted local linear regression, just like LIME does.

Why use...

Identifying influential features with factor prioritization

The Morris method is one of several global sensitivity analysis methods that range from simple Fractional factorial to complicated Monte Carlo filtering. Morris is somewhere on this spectrum, falling into two categories. It uses one-at-a-time sampling, which means that only one value changes between consecutive simulations. It’s also an Elementary Effects (EE) method, which means that it doesn’t quantify the exact effect of a factor in a model but rather gauges its importance and relationship with other factors. By the way, factor is just another word for a feature or variable that’s commonly used in applied statistics. To be consistent with the related theory, we will use this word in this and the next section.

Another property of Morris is that it’s less computationally expensive than the variance-based methods we will study next. It can provide more insights than simpler and less costly...

Quantifying uncertainty and cost sensitivity with factor fixing

With the Morris indices, it became evident that all the factors are non-linear or non-monotonic. There’s a high degree of interactivity between them – as expected! It should be no surprise that climate factors (temp, rain_1h, snow_1h, and cloud_coverage) are likely multicollinear with hr. There are also patterns to be found between hr, is_holiday, and dow and the target. Many of these factors most definitely don’t have a monotonic relationship with the target. We know this already. For instance, traffic doesn’t consistently increase as hours increase throughout the day. That’s not the case for days of the week either!

However, we didn’t know to what degree is_holiday and temp impacted the model, particularly during the crew’s working hours, which was an important insight. That being said, factor prioritization with Morris indices is usually to be taken as a starting...

Mission accomplished

The mission was to train a traffic prediction model and understand what factors create uncertainty and possibly increase costs for the construction company. We can conclude a significant portion of the potential $35,000/year in fines can be attributed to the is_holiday factor. Therefore, the construction company should rethink working holidays. There are only seven or eight holidays between March and November, and they could cost more because of the fines than working on a few Sundays instead. With this caveat, the mission was successful, but there’s still a lot of room for improvement.

Of course, these conclusions are for the LSTM_traffic_168_compact1 model – which we can compare with other models. Try replacing the model_name at the beginning of the notebook with LSTM_traffic_168_compact2, an equally small but significantly more robust model, or LSTM_traffic_168_optimal, a larger slightly better-performing model, and re-running the notebook...

Summary

After reading this chapter, you should understand how to assess a time series model’s predictive performance, know how to perform local interpretations for them with integrated gradients, and know how to produce both local and global attributions with SHAP. You should also know how to leverage sensitivity analysis factor prioritization and factor fixing for any model.

In the next chapter, we will learn how to reduce the complexity of a model and make it more interpretable with feature selection and engineering.

Dataset and image sources

Further reading

  • Wilson, D.R., and Martinez, T., 1997, Improved Heterogeneous Distance Functions. J. Artif. Int. Res. 6-1. pp.1-34: https://arxiv.org/abs/cs/9701101
  • Morris, M., 1991, Factorial sampling plans for preliminary computational experiments. Quality Engineering, 37, 307-310: https://doi.org/10.2307%2F1269043
  • Saltelli, A., Tarantola, S., Campolongo, F., and Ratto, M., 2007, Sensitivity analysis in practice: A guide to assessing scientific models. Chichester: John Wiley & Sons.
  • Sobol, I.M., 2001, Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. MATH COMPUT SIMULAT,55(1–3),271-280: https://doi.org/10.1016/S0378-4754(00)00270-6
  • Saltelli, A., P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola, 2010, Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer Physics Communications, 181(2):259-270: https://doi.org/10.1016/j.cpc...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Interpretable Machine Learning with Python - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781803235424
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Serg Masís

Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a climate and agronomic data scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a start-up, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events. Whether it pertains to leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making—and machine learning interpretation helps bridge this gap robustly.
Read more about Serg Masís