You're reading from Machine Learning Infrastructure and Best Practices for Software Engineers

Product typeBook

Published inJan 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781837634064

Edition1st Edition

Languages

Python

Concepts

Machine Learning

Author (1)

Miroslaw Staron

Ethics in Machine Learning Systems

Ethics involves data acquisition and management and focuses on collecting data, with a particular focus on protecting individuals and organizations from any harm that could be inflicted upon them. However, data is not the only source of bias in machine learning (ML) systems.

Algorithms and ways of data processing are also prone to introducing bias to the data. Despite our best efforts, some of the steps in data processing may even emphasize the bias and let it spread beyond algorithms and toward other parts of ML-based systems, such as user interfaces or decision-making components.

Therefore, in this chapter, we’ll focus on the bias in ML systems. We’ll start by exploring sources of bias and briefly discussing these sources. Then, we’ll explore ways to spot biases, how to minimize them, and finally how to communicate potential bias to the users of our system.

In this chapter, we’re going to cover the following...

Bias and ML – is it possible to have an objective AI?

In the intertwined domains of ML and software engineering, the allure of data-driven decision-making and predictive modeling is undeniable. These fields, which once operated largely in silos, now converge in numerous applications, from software development tools to automated testing frameworks. However, as we increasingly rely on data and algorithms, a pressing concern emerges: the issue of bias. Bias, in this context, refers to systematic and unfair discrepancies that can manifest in the decisions and predictions of ML models, often stemming from the very data used in software engineering processes.

The sources of bias in software engineering data are multifaceted. They can arise from historical project data, user feedback loops, or even the design and objectives of the software itself. For instance, if a software tool is predominantly tested and refined using feedback from a specific demographic, it might inadvertently...

Measuring and monitoring for bias

Let’s look at one of these frameworks – IBM AI Fairness 360 (https://github.com/Trusted-AI/AIF360). The basis for this framework is the ability to set variables that can be linked to bias and then calculate how different the other variables are. So, let’s dive into an example of how to calculate bias for a dataset. Since bias is often associated with gender or similar attributes, we need to use a dataset that contains it. So far in this book, we have not used any dataset that contained this kind of attribute, so we need to find another one.

Let’s take the Titanic survival dataset to check if there was any bias in terms of survivability between male and female passengers. First, we need to install the IBM AI Fairness 360 framework:

pip install aif360

Then, we can start creating a program that will check for bias. We need to import the appropriate libraries and create the data. In this example, we’ll create...

Developing mechanisms to prevent ML bias from spreading throughout the system

Unfortunately, it is generally not possible to completely remove bias from ML as we often do not have access to the attributes needed to reduce the bias. However, we can reduce the bias and reduce the risk that the bias spreads to the entire system.

Awareness and education are some of the most important measures that we can use to manage bias in software systems. We need to understand the potential sources of bias and their implications. We also need to identify biases related to protected attributes (for example, gender) and identify whether other attributes can be correlated with them (for example, occupation and address). Then, we need to educate our team about the ethical implications of biased models.

Then, we need to diversify our data collection. We must ensure that the data we collect is representative of the population we’re to model. To avoid over-representing or under-representing...

Summary

One of our responsibilities as software engineers is to ensure that we develop software systems that contribute to the greater good of society. We love working with technology development, but the technology needs to be developed responsibly. In this chapter, we looked at the concept of bias in ML and how to work with it. We looked at the IBM Fairness framework, which can assist us in identifying bias. We also learned that automated bias detection is too limited to be able to remove bias from the data completely.

There are more frameworks to explore and more studies and tools are available every day. These frameworks are more specific and provide a means to capture more domain-specific bias – in medicine and advertising. Therefore, my final recommendation in this chapter is to explore the bias frameworks that are specific to the task at hand and for the domain at hand.

References

Donald, A., et al., Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools. IEEE Access, 2023.
Bellamy, R.K., et al., AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018.
Zhang, Y., et al. Introduction to AI fairness. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 2020.
Alves, G., et al. Reducing unintended bias of ml models on tabular and textual data. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). 2021. IEEE.
Raza, S., D.J. Reji, and C. Ding, Dbias: detecting biases and ensuring fairness in news articles. International Journal of Data Science and Analytics, 2022: p. 1-21.

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Infrastructure and Best Practices for Software Engineers

Published in: Jan 2024Publisher: PacktISBN-13: 9781837634064

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages