You're reading from Building Data Science Solutions with Anaconda

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781800568785

Edition1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1)

Dan Meador

Chapter 6: Overcoming Bias in AI/ML

Bias in Artificial Intelligence (AI) is all around us. It can result in something as seemingly innocent as showing image results for developers, which include mostly men, to suggesting to a judge that a man of a certain race is at a much greater risk of being a repeat offender than others. You might think that you won't have that problem, but there are many shapes that bias can take that have nothing to do with you already being equipped to handle it.

The truth is removing all bias completely from datasets is impossible. Much of this is completely unintentional and is simply due to the lack of available data, but it doesn't matter. The damage can still be done. You'll see examples of bias in credit ratings, face detections, and others.

As AI is increasingly intertwined into the normal operations of society, it will continue to have its impact grow and have very real-world consequences felt by people. We can't claim ignorance...

Technical requirements

There are a few prerequisites to follow along with this chapter. They are as follows:

Anaconda Individual Edition is installed. This includes Conda and Navigator.
Download the datasets from https://github.com/PacktPublishing/Building-Data-Science-Solutions-with-Anaconda/tree/main/raw_data.

Defining bias versus discrimination

Let's start by making sure we have a clear understanding of the two components in the context of AI – bias and discrimination. There are different aspects to each of these components and it's important to understand the difference between them.

Bias in AI/ML

AI/ML bias is when models that have been created show favor toward certain groups or categories that doesn't reflect the actual state of the world.

Bias is inevitable in any model and in itself can be harmless. Let's say you are going to author a paper about the most popular foods and do some analysis on them. To do so, you collect data from your friends and family as to their preferences. In addition to this, think about the three foods that you would reply with. Are there any vegetables in there? Any Ethiopian foods? Anything from Turkey? Perhaps not.

This is a form of bias; unless you take a perfectly even sample size of people across the world, you are...

Overcoming proxy bias

There are times that you can introduce bias even if you don't have any features or data points that directly link to a protected class. Remember that a protected class is something such as age, sex, and religion. This is introduced by proxy. And this boils down to data being present that strongly correlates with someone being in that group due to data in some ways bleeding into that proxy dataset.

In the next diagram, you can see a representation of how proxy bias can leak into data. On the left, you have perfectly valid X and Y data, but there is also data B, which is in the form of protected class data. Even though the data from B isn't directly used in the training dataset, it is brought in via proxy through the X dataset:

Figure 6.1 – Proxy bias

Let's look at some examples of what proxy bias could look like to make this a bit more concrete.

Examples of proxy bias

The following list contains some examples...

Overcoming sample bias

Sample bias is when the choice of data doesn't reflect what is present in the real world. This is also referred to as selection bias. As with many types of bias, this can be completely harmless or very impactful, depending on the application.

In the following diagram, you can see a visual representation of what this looks like. There is hypothetical real-world data on the left that would be helpful (represented as Input z), but for one reason or another, it did not make it into the data that is included in the training dataset:

Figure 6.2 – Sample bias

When we leave this valuable data out, it is detrimental to everyone involved. The previous diagram is more abstract, so let's look at some more concrete examples of what sample bias could look like.

Examples of sample bias

The following items are examples of where sample bias could exist. Of course, this isn't close to an exhaustive list but helps to give...

Overcoming exclusion bias

Exclusions bias is when you choose to delete information that isn't considered useful. One of the strengths of AI is that it can find patterns or relationships for things that you didn't realize existed. This can happen more often if individuals or a team don't have a good set of domain knowledge around a subject and therefore are dismissive of items that they don't realize would be valuable.

An added danger arises if data scientists believe that they know an area well enough to be able to create models around it. This can go hand in hand with the Dunning–Kruger effect, which is a potential cognitive bias where people with low skill in a particular area overestimate their ability. You don't know what you don't know, meaning that when you are new to an area, there are many aspects of it you can't even realize are gaps in your knowledge. Conversely, you can have people with high knowledge in an area perceiving their...

Overcoming measurement bias

Measurement bias is when data collected differs from how it's collected in the real world. This would be an issue due to the model not understanding the nuance of how the real world might work. And how could it? All it knows is what you tell it.

The following diagram shows what this might look like. You can see at the top that the X, Y, and Z training data is used. Below that, you can see the real-world data (A, B, and C), which is fed into the model created from the training dataset. It is similar to the training data, but you can see it looks somewhat different and isn't quite the same as what was expected:

Figure 6.5 – Measurement bias

Having data that is different in training versus the real world can be a big issue. It's something that you might never even consider is an issue until much later, after your model has been in production for a long time, and then the damage of inaccurate predictions might...

Overcoming societal AI bias

According to an article from Lexalytics (https://www.lexalytics.com/lexablog/bias-in-ai-machine-learning), societal AI bias is when an AI behaves in ways that reflect social intolerance or institutional discrimination. At first glance, algorithms and data themselves may appear unbiased, but their output reinforces societal biases.

The following figure is a glimpse of what societal bias might look like in an abstract sense. You can see that there is some good data being brought in, but along with that, there is data that is in fragments. This misshapen data represents some flaw that doesn't allow us to get a correct sense of the state of the world we are trying to model:

Figure 6.6 – Societal bias

The fragmented and flawed data bits will be baked into any model trained on this data unless we do something about it. One unique thing about this bias is that it can be invisible once the data has already been gathered and...

Finding bias in an example

In the following example, there is a significant business impact to finding bias in data.

The housing data company Zillow recently backed out of the iBuying business. Zillow is a US-based company that lists housing information for average consumers to look at. iBuying is the term used for instant buying and involves Zillow buying properties directly and then selling them for a profit (in theory). Zillow found that their estimations (or zestimates) were off by a large factor, which led to the company pulling out of that area. Maybe we can find out why.

In this scenario, we will try to find where bias could have entered the system in something such as a zestimate. To give you a framework to work through, we'll walk through steps and each type of bias discussed earlier to think through it. This is important, as you might not instantly jump to a certain type of bias unless you see it. This is an issue, as everyone has a bias toward looking for something...

Summary

You might have heard the saying garbage in, garbage out when it comes to data and AI. In this chapter, you saw that we should also take just as seriously the phrase bias in, bias out.

We looked at some of the primary areas where bias can creep into our data and saw that we must start with an eye toward finding this sooner rather than later. At certain points in the process, it's too late. Bias and discrimination can have real-world impacts, from hiring and vehicle safety to continuing unjust practices around social norms.

You have a few options to make sure that you are doing all you can to avoid this bias such as having the domain knowledge or consulting those who do and getting others from different backgrounds to look at data (or better yet, on your team).

There are also many other types of bias that exist out there and, admittingly, things that we don't even realize are areas of concern. It's also important to be aware that the drift talked about...

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Solutions with Anaconda

Published in: May 2022Publisher: PacktISBN-13: 9781800568785

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages