You're reading from Azure Data Scientist Associate Certification Guide

Product type Book

Published in Dec 2021

Publisher Packt

ISBN-13 9781800565005

Pages 448 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Authors (2):

Andreas Botsikas

Michael Hlobil

View More author details

Table of Contents (17) Chapters

Preface

Section 1: Starting your cloud-based data science journey

Chapter 1: An Overview of Modern Data Science

Chapter 2: Deploying Azure Machine Learning Workspace Resources

Chapter 3: Azure Machine Learning Studio Components

Chapter 4: Configuring the Workspace

Section 2: No code data science experimentation

Chapter 5: Letting the Machines Do the Model Training

Chapter 6: Visual Model Training and Publishing

Section 3: Advanced data science tooling and capabilities

Chapter 7: The AzureML Python SDK

Chapter 8: Experimenting with Python Code

Chapter 9: Optimizing the ML Model

Chapter 10: Understanding Model Results

Chapter 11: Working with Pipelines

Chapter 12: Operationalizing Models with Code

Other Books You May Enjoy

Chapter 12: Operationalizing Models with Code

In this chapter, you are going to learn how to operationalize the machine learning models you have been training, in this book, so far. You will explore two approaches: exposing a real-time endpoint by hosting a REST API that you can use to make inferences and expanding your pipeline authoring knowledge to make inferences on top of big data, in parallel, efficiently. You will begin by registering a model in the workspace to keep track of the artifact. Then, you will publish a REST API; this is something that will allow your model to integrate with third-party applications such as Power BI. Following this, you will author a pipeline to process half a million records within a couple of minutes in a very cost-effective manner.

In this chapter, we are going to cover the following topics:

Understanding the various deployment options
Registering models in the workspace
Deploying real-time endpoints
Creating a batch inference...

Technical requirements

You will require access to an Azure subscription. Within that subscription, you will need a resource group named packt-azureml-rg. You will need to have either a Contributor or Owner Access control (IAM) role on the resource group level. Within that resource group, you should have already deployed a machine learning resource, named packt-learning-mlw. These resources should be already available to you if you followed the instructions in Chapter 2, Deploying Azure Machine Learning Workspace Resources.

Additionally, you will require a basic understanding of the Python language. The code snippets in this chapter target Python version 3.6 or later. You should also be familiar with working with notebooks within AzureML studio; this is something that was covered in Chapter 7, The AzureML Python SDK.

This chapter assumes you have registered the loans dataset that you generated in Chapter 10, Understanding Model Results. It also assumes that you have created a...

Understanding the various deployment options

We have been working with Python code since Chapter 8, Experimenting with Python Code. So far, you have trained various models, evaluated them based on metrics, and saved the trained model using the dump method of the joblib library. The AzureML workspace allows you to store and version those artifacts by registering them in the model registry that we discussed in Chapter 5, Letting the Machines Do the Model Training. Registering the model allows you to version both the saved model and the metadata regarding the specific model, such as its performance according to various metrics. You will learn how to register models from the SDK in the Registering models in the workspace section.

Once the model has been registered, you have to decide how you want to operationalize the model, either by deploying a real-time endpoint or by creating a batch process, as displayed in Figure 12.1:

Figure 12.1 – A path from training...

Registering models in the workspace

Registering a model allows you to keep different versions of the trained models. Each model version has artifacts and metadata. Among the metadata, you can keep references to experiment with runs and datasets. This allows you to track the lineage between the data used to train a model, the run ID that trained the model, and the actual model artifacts themselves, as displayed in Figure 12.2:

Figure 12.2 – Building the lineage from the training dataset all the way to the registered model

In this section, you will train a model and register it in your AzureML workspace. Perform the following steps:

Navigate to the Notebooks section of your AzureML studio web interface.
Create a folder, named chapter12, and then create a notebook named chapter12.ipynb, as shown in Figure 12.3:
Figure 12.3 – Adding the chapter12 notebook to your working files
Add and execute the following code snippets in separate...

Deploying real-time endpoints

Let's imagine that you have an e-banking solution that has a process for customers to request loans. You want to properly set the expectations of the customer and prepare them for potential rejection. When the customer submits their loan application form, you want to invoke the model you registered in the Registering models in the workspace section, that is, the model named chapter12-loans, and pass in the information that the customer filled out on the application form. If the model predicts that the loan will not be approved, a message will appear on the confirmation page of the loan request, preparing the customer for the potential rejection of the loan request.

Figure 12.5 shows an oversimplified architecture to depict the flow of requests that start from the customer to the real-time endpoint of the model:

Figure 12.5 – An oversimplified e-banking architecture showing the flow of requests from the customer...

Creating a batch inference pipeline

In Chapter 11, Working with Pipelines, you learned how to create pipelines that orchestrate multiple steps. These pipelines can be invoked using a REST API, similar to the real-time endpoint that you created in the previous section. One key difference is that in the real-time endpoint, the infrastructure is constantly on, waiting for a request to arrive, while in the published pipelines, the cluster will spin up only after the pipeline has been triggered.

You could use these pipelines to orchestrate batch inference on top of data residing in a dataset. For example, let's imagine that you just trained the loans model you have been using in this chapter. You want to run the model against all of the pending loan requests and store the results; this is so that you can implement an email campaign targeting the customers that might get their loan rejected. The easiest approach is to create a single PythonScriptStep that will process each record...

Summary

In this chapter, you explored various ways in which to use the machine learning models that you have been training in this book. You can make either real-time inferences or batch process a large number of records in a cost-effective manner. You started by registering the model you would use for inferences. From there, you can either deploy a real-time endpoint in ACI for testing or in AKS for production workloads that require high availability and automatic scaling. You explored how to profile your model to determine the recommended container size to host the real-time endpoint. Following this, you discovered Application Insights, which allows you to monitor production endpoints and identify potential production issues. Through Application Insights, you noticed that the real-time endpoint you produced wasn't exposing a swagger.json file that was needed by third-party applications, such as Power BI, to automatically consume your endpoint. You modified the scoring function...

Questions

In each chapter, you will find a number of questions to validate your understanding of the topics that have been discussed:

You want to deploy a real-time endpoint that will handle transactions from a live betting website. The traffic from this website will have spikes during games and will be very low during the night. Which of the following compute targets should you use?
a. ACI
b. A compute instance
c. A compute cluster
d. AKS
You want to monitor a real-time endpoint deployed in AKS and determine the average response time of the service. Which monitoring solution should you use?
a. ACI
b. Azure Container Registry
c. Application Insights
You have a computer vision model, and you want to process 100 images in parallel. You author a pipeline with a parallel step. You want to process 10 images at a time. Which of the following ParallelRunConfig parameters should you set?
a. mini_batch_size=10
b. error_threshold=10
c. node_count=10
d. process_count_per_node=10

...

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

The rest of the chapter is locked

You have been reading a chapter from

Azure Data Scientist Associate Certification Guide

Published in: Dec 2021 Publisher: Packt ISBN-13: 9781800565005

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (2)

Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.

See other products by Andreas Botsikas

Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.

See other products by Michael Hlobil