You're reading from Machine Learning Security with Azure

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781805120483

Edition1st Edition

Tools

Azure

Concepts

Machine Learning

Author (1)

Georgia Kalyva

Data Privacy and Responsible AI Best Practices

In the previous chapter, we talked about how to build a data governance program for our organization and how to identify types of sensitive data. Our work does not stop there. Although in some cases we can safely exclude sensitive information, other times we cannot. So, our machine learning (ML) models that solve problems might need to contain personal data. Sometimes that data can be relevant and useful, or it can create unintended correlations that make the model biased. This is the issue that we will tackle in this chapter.

We will talk about how to recognize sensitive information and how to mitigate it if it is not relevant to the model training process by using techniques such as differential privacy. We will explore how to protect individual information even from aggregated data or the model results. To help us with that, we will see how we can use the SmartNoise software development kit (SDK).

We will also discuss fairness...

Technical requirements

The code for this chapter is available in this repository under the ch5 folder:

https://github.com/PacktPublishing/Machine-Learning-Model-Security-in-Azure/

Working with Python

To use the libraries, you need to be familiar with Python. In this book, we will use notebooks from the Azure Machine Learning environment to run the examples, but if you prefer to use your own development environment and tools, that is fine.

Getting started with Python

New to Python and ML? Take a look at this learning path to learn the basics of Python: https://learn.microsoft.com/en-us/training/paths/beginner-python/.

Running a notebook in Azure Machine Learning

The process of running a notebook in Azure Machine Learning is very straightforward. All you need to do is import or create a workbook in the interface, attach a compute target, and then run the cells. Let us see the steps together:

Go to the Notebooks section and upload or create your file:

...

Discovering and protecting sensitive data

Although having good governance and working with multiple tools that work with data can help us with sensitive data discovery classification and profiling, more often than not, the data used in our ML experiments comes from outside sources, or maybe we are simply not developing for our own organization. In that case, we need to train ourselves on what sensitive data is and how to do a quick cleanup if we need to use Azure Machine Learning.

Identifying sensitive data

Sensitive data refers to any information that, if exposed, could cause harm, privacy breaches, or lead to identity theft, monetary loss, or other adverse consequences for individuals or organizations. This data requires special protection due to its nature and the potential risks associated with its disclosure.

There are many categories of sensitive data, many of which are outlined ahead, together with examples that we need to be aware of:

Personally identifiable...

Introducing differential privacy

Differential privacy is a concept that has the purpose of protecting the privacy of individual data contributors while still allowing useful statistical analysis. The basic idea behind differential privacy is to add noise or random perturbations to the data in such a way that the statistical properties of the dataset stay the same, but it is much more difficult to identify individual information within the dataset.

The level of privacy protection in differential privacy is controlled by a parameter called epsilon (ε). A smaller value of epsilon indicates a higher level of privacy, but it might also lead to a decrease in data utility (usefulness of the data for analysis). Striking a balance between privacy and utility is a key challenge in implementing differential privacy:

Figure 5.3 – Epsilon (Ɛ) value relationship with privacy and accuracy

A library that we can use to add noise to the data is the...

Mitigating fairness

Mitigating fairness in ML models is an essential step to ensure that the model does not exhibit bias or discrimination against certain groups of individuals. Even though we can remove PII from our datasets, predictions might favor different groups based on characteristics such as race, gender, age, or religion. If the training data is not diverse and representative of the population you aim to serve, bias can creep into the model if the data does not adequately represent all groups.

Firstly, we need to learn to identify bias in our models. This is easy by conducting an analysis of the metrics of the model. Suppose you suspect that your load approval model favors people above a certain age to get their loan application approved. You can start by looking at the metrics for the complete dataset as follows:

...

Working with model interpretability

Model interpretability in ML refers to the ability to understand and explain how a particular model makes predictions or decisions. Interpretable models provide clear insights into the features or variables that are most influential in the model’s decision-making process. This is particularly important in domains where the decision-making process needs to be transparent and understandable, such as healthcare, finance, and legal systems.

Although you can never explain 100% why a model makes a prediction, you can use explainers to understand which features affect the results. Explainers can help us provide global explanations; for example, which features affect the overall behavior of the model or local explanations that provide us with information on what influenced an individual prediction.

Let us explore some methods we can use to achieve model interpretability:

Feature importance (FI) determines the influence of each feature...

Exploring FL and secure multi-party computation

FL is an ML approach that enables the training of models across multiple devices or servers without centrally aggregating the raw data. In traditional ML, data is usually collected and sent to a central compute server for training, which raises privacy and security concerns, especially when dealing with sensitive or personal information.

In FL, the training process happens locally on the devices or nodes (for example smartphones, edge devices, or compute instances) that generate or store the data. These nodes collaborate by sharing only model updates (gradients) rather than the raw data itself. The central compute server aggregates these updates to create an improved global model. This process is repeated iteratively, with each node contributing to the model’s improvement while keeping its data private.

The main advantages of FL are as follows:

Privacy: As the raw data remains on the local nodes, there is no need...

Summary

Protecting sensitive data is a multi-faceted problem. There are ways and techniques to mitigate fairness and protect privacy work ethically and responsibly with AI, but the balance between prediction accuracy and data protection is very sensitive. If you add the complexity of choosing the right combination of techniques based on your data and algorithms, it can seem daunting.

In this chapter, we learned to identify different types of sensitive data and common techniques to remove or mask them. However, it is not always possible to completely eliminate them as they are useful for the model training process. In this case, there are several libraries available to help. We can use the SmartNoise SDK to introduce noise to our data and protect privacy, work with the Fairlearn SDK to mitigate fairness, and use the Responsible AI dashboard together with explainers to interpret our models. We ended this chapter by introducing the concept of FL and how to apply it using Azure Machine...

Run a Federated Learning Demo in 5 mins: https://github.com/Azure-Samples/azure-ml-federated-learning/blob/main/docs/quickstart.md
Federated Learning with Azure Machine Learning, NVIDIA FLARE and MONAI – Build session: https://build.microsoft.com/en-US/sessions/5bd5120f-5239-450d-8a57-373efb43c0cf?source=sessions
Medical Imaging with Azure Machine Learning Demos: https://github.com/Azure/medical-imaging

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Security with Azure

Published in: Dec 2023Publisher: PacktISBN-13: 9781805120483

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Georgia Kalyva

Georgia Kalyva is a technical trainer at Microsoft. She was recognized as a Microsoft AI MVP, is a Microsoft Certified Trainer, and is an international speaker with more than 10 years of experience in Microsoft Cloud, AI, and developer technologies. Her career covers several areas, ranging from designing and implementing solutions to business and digital transformation. She holds a bachelor's degree in informatics from the University of Piraeus, a master's degree in business administration from the University of Derby, and multiple Microsoft certifications. Georgia's honors include several awards from international technology and business competitions, and her journey to excellence stems from a growth mindset and a passion for technology.
Read more about Georgia Kalyva

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

Selection Rate

Accuracy

Recall

You're reading from Machine Learning Security with Azure

Data Privacy and Responsible AI Best Practices

Technical requirements

Working with Python

Running a notebook in Azure Machine Learning

Discovering and protecting sensitive data

Identifying sensitive data

Introducing differential privacy

Mitigating fairness

Working with model interpretability

Exploring FL and secure multi-party computation

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook