Reader small image

You're reading from  Machine Learning Infrastructure and Best Practices for Software Engineers

Product typeBook
Published inJan 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781837634064
Edition1st Edition
Languages
Right arrow
Author (1)
Miroslaw Staron
Miroslaw Staron
author image
Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron

Right arrow

Ethics in Machine Learning Systems

Ethics involves data acquisition and management and focuses on collecting data, with a particular focus on protecting individuals and organizations from any harm that could be inflicted upon them. However, data is not the only source of bias in machine learning (ML) systems.

Algorithms and ways of data processing are also prone to introducing bias to the data. Despite our best efforts, some of the steps in data processing may even emphasize the bias and let it spread beyond algorithms and toward other parts of ML-based systems, such as user interfaces or decision-making components.

Therefore, in this chapter, we’ll focus on the bias in ML systems. We’ll start by exploring sources of bias and briefly discussing these sources. Then, we’ll explore ways to spot biases, how to minimize them, and finally how to communicate potential bias to the users of our system.

In this chapter, we’re going to cover the following...

Bias and ML – is it possible to have an objective AI?

In the intertwined domains of ML and software engineering, the allure of data-driven decision-making and predictive modeling is undeniable. These fields, which once operated largely in silos, now converge in numerous applications, from software development tools to automated testing frameworks. However, as we increasingly rely on data and algorithms, a pressing concern emerges: the issue of bias. Bias, in this context, refers to systematic and unfair discrepancies that can manifest in the decisions and predictions of ML models, often stemming from the very data used in software engineering processes.

The sources of bias in software engineering data are multifaceted. They can arise from historical project data, user feedback loops, or even the design and objectives of the software itself. For instance, if a software tool is predominantly tested and refined using feedback from a specific demographic, it might inadvertently...

Measuring and monitoring for bias

Let’s look at one of these frameworks – IBM AI Fairness 360 (https://github.com/Trusted-AI/AIF360). The basis for this framework is the ability to set variables that can be linked to bias and then calculate how different the other variables are. So, let’s dive into an example of how to calculate bias for a dataset. Since bias is often associated with gender or similar attributes, we need to use a dataset that contains it. So far in this book, we have not used any dataset that contained this kind of attribute, so we need to find another one.

Let’s take the Titanic survival dataset to check if there was any bias in terms of survivability between male and female passengers. First, we need to install the IBM AI Fairness 360 framework:

pip install aif360

Then, we can start creating a program that will check for bias. We need to import the appropriate libraries and create the data. In this example, we’ll create...

Developing mechanisms to prevent ML bias from spreading throughout the system

Unfortunately, it is generally not possible to completely remove bias from ML as we often do not have access to the attributes needed to reduce the bias. However, we can reduce the bias and reduce the risk that the bias spreads to the entire system.

Awareness and education are some of the most important measures that we can use to manage bias in software systems. We need to understand the potential sources of bias and their implications. We also need to identify biases related to protected attributes (for example, gender) and identify whether other attributes can be correlated with them (for example, occupation and address). Then, we need to educate our team about the ethical implications of biased models.

Then, we need to diversify our data collection. We must ensure that the data we collect is representative of the population we’re to model. To avoid over-representing or under-representing...

Summary

One of our responsibilities as software engineers is to ensure that we develop software systems that contribute to the greater good of society. We love working with technology development, but the technology needs to be developed responsibly. In this chapter, we looked at the concept of bias in ML and how to work with it. We looked at the IBM Fairness framework, which can assist us in identifying bias. We also learned that automated bias detection is too limited to be able to remove bias from the data completely.

There are more frameworks to explore and more studies and tools are available every day. These frameworks are more specific and provide a means to capture more domain-specific bias – in medicine and advertising. Therefore, my final recommendation in this chapter is to explore the bias frameworks that are specific to the task at hand and for the domain at hand.

References

  • Donald, A., et al., Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools. IEEE Access, 2023.
  • Bellamy, R.K., et al., AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018.
  • Zhang, Y., et al. Introduction to AI fairness. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 2020.
  • Alves, G., et al. Reducing unintended bias of ml models on tabular and textual data. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). 2021. IEEE.
  • Raza, S., D.J. Reji, and C. Ding, Dbias: detecting biases and ensuring fairness in news articles. International Journal of Data Science and Analytics, 2022: p. 1-21.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Infrastructure and Best Practices for Software Engineers
Published in: Jan 2024Publisher: PacktISBN-13: 9781837634064
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron