Reader small image

You're reading from  Building Data Science Solutions with Anaconda

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781800568785
Edition1st Edition
Concepts
Right arrow
Author (1)
Dan Meador
Dan Meador
author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Right arrow

Chapter 10: Explainable AI - Using LIME and SHAP

Let's play out a quick scenario. You are sitting in the doctor's office after having just received your annual physical. The doctor looks over your results and then casually mentions that you have a 95% chance of having a heart attack in the next month. What is your next question? "Why?" you ask. "I'm not sure," replies the doctor.

What would be your reaction to the doctor's answer in this situation? Is that good enough, or would you like a bit more information? If you are like most people, you might want to know if there was anything you could do to prevent it, or maybe there was a mix-up with your blood work and you have nothing to fear.

The use of AI models is just going to increase in the years to come with AutoML and other low and no-code options that allow those with less technical expertise to create models. You see a lot of models, but you need to be able to explain how things were...

Technical requirements

There are just a few things that we'll need in order to go through this chapter.

The first is that the Anaconda distribution is installed. As we know, this includes Python, conda, Navigator, as well as many other packages used in data science.

We will use the following packages:

  • SHAP
  • LIME
  • pandas
  • NumPy
  • Scikit-learn

You should create a new conda environment to install all of these packages. It's recommended to install them all at the beginning, but you can also do so as you get to the relevant part of the chapter.

After you have these things in place, we can look at why we should care about the interpretation of models in the first place.

Understanding the value of interpretation

Explainable AI will be essential if users are to understand, appropriately trust, and effectively manage this incoming generation of artificially intelligent partners.

This was the outlook of DARPA in their Explainable Artificial Intelligence (XAI) report in 2016 (https://bit.ly/3JU9yql).

Whether you agree or not with AI being used in military endeavors, it is being used in this field. Being able to know why certain outcomes are achieved is critical in this space and many others. AI isn't used just in the more traditional role the military plays, but also when trying to discern the third- and fourth-order impacts on the military supply chain when resources such as planes are moved from one military base to another.

Knowing the difference between interpreting and explaining

There is a lot of value in knowing what features are giving you the results you get, and there is even more in being able to explain why it matters. One...

Understanding models that are interpretable by design

In Chapter 7, Choosing the Best AI Algorithm, we mentioned that more complex algorithm types, such as neural networks, are often used even though they provide very little benefit. You should favor keeping it simple as much as possible, following the KISS principle (keep it simple, stupid). Not only may other models be simpler and easier to interpret, but they provide some fantastic results as well. Simple doesn't mean inferior.

We have looked at many models in this book that come with the ability to understand how the results were achieved without any special techniques. The algorithms we will cover now are as follows:

  • Decision trees
  • Linear/logistic regression
  • KNN

We'll use a medical example, as mentioned earlier in the chapter. This dataset is a binary classifier for whether someone is at risk of heart disease or not. For this chapter, we'll keep the data preparation and other steps out...

Explaining a model's outcome with LIME

Now we are moving on to black box models. They are becoming much more common due to the efficacy they have shown in popular areas of the domain, such as NLP, vision problems, and various other areas where vast amounts of data being fed in produce amazing results. These domains aren't going anywhere, and so we need to find a way to interpret these models after the fact using post-hoc interpretability.

The first approach that we'll look at is Local Interpretable Model-Agnostic Explanations (LIME), which assumes that if you zoom in on even a complex nonlinear relationship, you will find a linear one at the local level. It then will try to learn this local linear relationship by creating synthetic records that are like the record we care about. By creating these points/records that have slightly altered inputs, it can figure out the impact that each feature has based on the model's output. As the name suggests, its model agnostic...

Explaining a model's outcome with SHAP

Not long after LIME came out, another tool was introduced to help with interpreting AI models, SHapley Additive exPlanations (SHAP). The main function of SHAP is that if you take permutations of the different input features, you can determine how important each feature is to the outcome. This might sound like LIME, and that's because it took inspiration from it while also using some new concepts. There are, however, key differences, which we'll explain.

Avoid confusion with Shapley values

This approach is based on Shapley values, but is not the same thing. SHAP is an iteration with Shapley game theory at its base. There are various forms of SHAP, such as kernel and TreeSHAP. Also, there are global interpretations that SHAP allows, which again are building onto what Shapley values allows.

Let's look at an example to make it a bit clearer how SHAP is used.

Let's say you are a player in a two-on-two basketball...

Summary

Throughout this chapter, you gained insights into how interpretability and explainability fit into the picture of a healthy model and a robust data science workflow. We saw how they are important not just for creating a great model, but also for business, moral, and legal reasons.

We checked back into the algorithms from earlier chapters, such as decision trees, and saw that they have a great advantage not only in accuracy but also in their ability to be interpreted by the data scientists creating them.

Later, we saw how even despite the suggestion that simpler models should be considered first, black box models are quite common, so we should still be able to interpret models such as random forests. With that in mind, you saw how LIME can be a great tool to turn that black box into a more transparent version of itself by assuming that linear relationships can be found when zooming in on the global space.

Finally, we checked out SHAP, which builds on Shapley values...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Science Solutions with Anaconda
Published in: May 2022Publisher: PacktISBN-13: 9781800568785
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador