Reader small image

You're reading from  Machine Learning Security with Azure

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781805120483
Edition1st Edition
Tools
Right arrow
Author (1)
Georgia Kalyva
Georgia Kalyva
author image
Georgia Kalyva

Georgia Kalyva is a technical trainer at Microsoft. She was recognized as a Microsoft AI MVP, is a Microsoft Certified Trainer, and is an international speaker with more than 10 years of experience in Microsoft Cloud, AI, and developer technologies. Her career covers several areas, ranging from designing and implementing solutions to business and digital transformation. She holds a bachelor's degree in informatics from the University of Piraeus, a master's degree in business administration from the University of Derby, and multiple Microsoft certifications. Georgia's honors include several awards from international technology and business competitions, and her journey to excellence stems from a growth mindset and a passion for technology.
Read more about Georgia Kalyva

Right arrow

Data Protection and Governance

Data is a fundamental part of machine learning, so in this chapter, we will focus on all aspects of governing, storing, and securing data. We will start by explaining what data governance is and how vital it is to know what data we have. We need to know how to develop a management framework so that we can improve everything, from organizational workflows to business decisions. Data governance will also help us identify sensitive data and how we can better secure it, as part of our data governance program or our machine learning solutions.

We will begin by learning the best practices to store and retrieve data in Azure machine learning. We will see many data services that we can use to save our datasets – for example, the Azure Blob Storage and the Azure SQL database and their basic encryption and security features. Azure Machine Learning already provides us with a lot of features, such as versioning and logging, but since data is usually connected...

Working with data governance in Azure

Data governance refers to the overall management and control of an organization’s data assets. It involves establishing processes, policies, and guidelines to ensure availability, integrity, security, privacy, and the effective and efficient use of information. This is always important, but it is especially crucial when we’re talking about ML, as ML models are based on data. Whether we’re talking about data used to train our models or data generated by our models, it does not change the fact that we need to be aware of every piece of information we process and what its life cycle is.

To implement data governance effectively, organizations typically need to establish a data governance framework or strategy, which outlines the structure, processes, and responsibilities for data management. This framework should include the formation of a data governance committee or council, data governance policies and procedures, data stewardship...

Storing and retrieving data in Azure Machine Learning

The first task is storing and retrieving data in Azure Machine Learning. You can bring data into Machine Learning in a multitude of ways. That includes anything from your local machine, a source on the internet, or even cloud-based storage. In this section, we will explore all those concepts.

Let us see how to work with datastores.

Connecting datastores

As we mentioned in the Azure Machine Learning introduction in Chapter 1, datastores serve as a reference to an existing storage service, whether that is a storage account or a database. If you already have a reference or a connection to your data, this is not mandatory, as you can connect external sources as well, but connecting datastores has many benefits. Firstly, you have a common way to connect different data sources to your workspace without the need to add credential information anywhere in your scripts or your code, which is a best practice in terms of security....

Encrypting and securing data

As we saw in the previous section, Azure Machine Learning relies on external services to pull in data as data assets. Depending on the service that hosts the data, there are different security and data protection features we can use, such as encryption, data classification, and data masking.

In this section, we will explore encryption and classification features that relate to our data.

Encryption at rest

Encryption at rest refers to the practice of encrypting data while it is stored or at rest in a storage medium, such as cloud storage. The purpose of encryption at rest is to protect data from unauthorized access if the storage medium is compromised, lost, or stolen.

When data is encrypted at rest, it is transformed into an unreadable form using an encryption algorithm and a cryptographic key. Only authorized users or processes with the proper decryption key can access and decrypt the data to its original readable form. Without the decryption...

Exploring backup and recovery

Backup and recovery are closely related concepts and are both needed so that we can safeguard our data. In this section, we will explain backup and recovery options for our workspace and the data connected to them. We will also talk about how to approach situations where backup options are not available. However, before we get started, let us remember what backup and recovery are.

Backup refers to the process of creating copies or replicas of data and storing them in a separate location or medium. The purpose of backups is to provide a means of recovering data if there is data loss, accidental deletion, system failures, disasters, or other unforeseen events. Backups serve as a safety net, allowing you to restore data to its previous state or a specific point in time. Backups can be performed at different levels, such as full backups (copying all data), incremental backups (copying only the changes since the last backup), or differential backups (copying...

Summary

Everything is based on data, so having a clear map of what data you have, where it is stored, how sensitive it is, and how to protect it should be your number one priority if you work with ML. So, while working with models and algorithms might be the most exciting part of ML, having a data governance and protection plan will save you from data-related issues. The CDMC framework is a very comprehensive strategy that you can use especially with cloud data, but as always, it is not the only option. Building your own data strategy policy is ultimately your decision, and the result will always be beneficial, depending on the industry and location you belong to.

As soon as you decide on a strategy, there are a lot of tools in Azure available for governance, such as Azure Policy, Azure Blueprints, Cost Management, and Microsoft Purview, each with its own benefits and limitations. As tools can come and go and data governance is not a one-off process, do not be afraid to start small...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Security with Azure
Published in: Dec 2023Publisher: PacktISBN-13: 9781805120483
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Georgia Kalyva

Georgia Kalyva is a technical trainer at Microsoft. She was recognized as a Microsoft AI MVP, is a Microsoft Certified Trainer, and is an international speaker with more than 10 years of experience in Microsoft Cloud, AI, and developer technologies. Her career covers several areas, ranging from designing and implementing solutions to business and digital transformation. She holds a bachelor's degree in informatics from the University of Piraeus, a master's degree in business administration from the University of Derby, and multiple Microsoft certifications. Georgia's honors include several awards from international technology and business competitions, and her journey to excellence stems from a growth mindset and a passion for technology.
Read more about Georgia Kalyva