Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Azure Data Scientist Associate Certification Guide

You're reading from  Azure Data Scientist Associate Certification Guide

Product type Book
Published in Dec 2021
Publisher Packt
ISBN-13 9781800565005
Pages 448 pages
Edition 1st Edition
Languages
Authors (2):
Andreas Botsikas Andreas Botsikas
Profile icon Andreas Botsikas
Michael Hlobil Michael Hlobil
Profile icon Michael Hlobil
View More author details

Table of Contents (17) Chapters

Preface Section 1: Starting your cloud-based data science journey
Chapter 1: An Overview of Modern Data Science Chapter 2: Deploying Azure Machine Learning Workspace Resources Chapter 3: Azure Machine Learning Studio Components Chapter 4: Configuring the Workspace Section 2: No code data science experimentation
Chapter 5: Letting the Machines Do the Model Training Chapter 6: Visual Model Training and Publishing Section 3: Advanced data science tooling and capabilities
Chapter 7: The AzureML Python SDK Chapter 8: Experimenting with Python Code Chapter 9: Optimizing the ML Model Chapter 10: Understanding Model Results Chapter 11: Working with Pipelines Chapter 12: Operationalizing Models with Code Other Books You May Enjoy

Chapter 12: Operationalizing Models with Code

In this chapter, you are going to learn how to operationalize the machine learning models you have been training, in this book, so far. You will explore two approaches: exposing a real-time endpoint by hosting a REST API that you can use to make inferences and expanding your pipeline authoring knowledge to make inferences on top of big data, in parallel, efficiently. You will begin by registering a model in the workspace to keep track of the artifact. Then, you will publish a REST API; this is something that will allow your model to integrate with third-party applications such as Power BI. Following this, you will author a pipeline to process half a million records within a couple of minutes in a very cost-effective manner.

In this chapter, we are going to cover the following topics:

  • Understanding the various deployment options
  • Registering models in the workspace
  • Deploying real-time endpoints
  • Creating a batch inference...

Technical requirements

You will require access to an Azure subscription. Within that subscription, you will need a resource group named packt-azureml-rg. You will need to have either a Contributor or Owner Access control (IAM) role on the resource group level. Within that resource group, you should have already deployed a machine learning resource, named packt-learning-mlw. These resources should be already available to you if you followed the instructions in Chapter 2, Deploying Azure Machine Learning Workspace Resources.

Additionally, you will require a basic understanding of the Python language. The code snippets in this chapter target Python version 3.6 or later. You should also be familiar with working with notebooks within AzureML studio; this is something that was covered in Chapter 7, The AzureML Python SDK.

This chapter assumes you have registered the loans dataset that you generated in Chapter 10, Understanding Model Results. It also assumes that you have created a...

Understanding the various deployment options

We have been working with Python code since Chapter 8, Experimenting with Python Code. So far, you have trained various models, evaluated them based on metrics, and saved the trained model using the dump method of the joblib library. The AzureML workspace allows you to store and version those artifacts by registering them in the model registry that we discussed in Chapter 5, Letting the Machines Do the Model Training. Registering the model allows you to version both the saved model and the metadata regarding the specific model, such as its performance according to various metrics. You will learn how to register models from the SDK in the Registering models in the workspace section.

Once the model has been registered, you have to decide how you want to operationalize the model, either by deploying a real-time endpoint or by creating a batch process, as displayed in Figure 12.1:

Figure 12.1 – A path from training...

Registering models in the workspace

Registering a model allows you to keep different versions of the trained models. Each model version has artifacts and metadata. Among the metadata, you can keep references to experiment with runs and datasets. This allows you to track the lineage between the data used to train a model, the run ID that trained the model, and the actual model artifacts themselves, as displayed in Figure 12.2:

Figure 12.2 – Building the lineage from the training dataset all the way to the registered model

In this section, you will train a model and register it in your AzureML workspace. Perform the following steps:

  1. Navigate to the Notebooks section of your AzureML studio web interface.
  2. Create a folder, named chapter12, and then create a notebook named chapter12.ipynb, as shown in Figure 12.3:

    Figure 12.3 – Adding the chapter12 notebook to your working files

  3. Add and execute the following code snippets in separate...

Deploying real-time endpoints

Let's imagine that you have an e-banking solution that has a process for customers to request loans. You want to properly set the expectations of the customer and prepare them for potential rejection. When the customer submits their loan application form, you want to invoke the model you registered in the Registering models in the workspace section, that is, the model named chapter12-loans, and pass in the information that the customer filled out on the application form. If the model predicts that the loan will not be approved, a message will appear on the confirmation page of the loan request, preparing the customer for the potential rejection of the loan request.

Figure 12.5 shows an oversimplified architecture to depict the flow of requests that start from the customer to the real-time endpoint of the model:

Figure 12.5 – An oversimplified e-banking architecture showing the flow of requests from the customer...

Creating a batch inference pipeline

In Chapter 11, Working with Pipelines, you learned how to create pipelines that orchestrate multiple steps. These pipelines can be invoked using a REST API, similar to the real-time endpoint that you created in the previous section. One key difference is that in the real-time endpoint, the infrastructure is constantly on, waiting for a request to arrive, while in the published pipelines, the cluster will spin up only after the pipeline has been triggered.

You could use these pipelines to orchestrate batch inference on top of data residing in a dataset. For example, let's imagine that you just trained the loans model you have been using in this chapter. You want to run the model against all of the pending loan requests and store the results; this is so that you can implement an email campaign targeting the customers that might get their loan rejected. The easiest approach is to create a single PythonScriptStep that will process each record...

Summary

In this chapter, you explored various ways in which to use the machine learning models that you have been training in this book. You can make either real-time inferences or batch process a large number of records in a cost-effective manner. You started by registering the model you would use for inferences. From there, you can either deploy a real-time endpoint in ACI for testing or in AKS for production workloads that require high availability and automatic scaling. You explored how to profile your model to determine the recommended container size to host the real-time endpoint. Following this, you discovered Application Insights, which allows you to monitor production endpoints and identify potential production issues. Through Application Insights, you noticed that the real-time endpoint you produced wasn't exposing a swagger.json file that was needed by third-party applications, such as Power BI, to automatically consume your endpoint. You modified the scoring function...

Questions

In each chapter, you will find a number of questions to validate your understanding of the topics that have been discussed:

  1. You want to deploy a real-time endpoint that will handle transactions from a live betting website. The traffic from this website will have spikes during games and will be very low during the night. Which of the following compute targets should you use?

    a. ACI

    b. A compute instance

    c. A compute cluster

    d. AKS

  2. You want to monitor a real-time endpoint deployed in AKS and determine the average response time of the service. Which monitoring solution should you use?

    a. ACI

    b. Azure Container Registry

    c. Application Insights

  3. You have a computer vision model, and you want to process 100 images in parallel. You author a pipeline with a parallel step. You want to process 10 images at a time. Which of the following ParallelRunConfig parameters should you set?

    a. mini_batch_size=10

    b. error_threshold=10

    c. node_count=10

    d. process_count_per_node=10

...

Further reading

This section offers a list of helpful web resources to help you augment your knowledge of the AzureML SDK and the various code snippets used in this chapter:

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Azure Data Scientist Associate Certification Guide
Published in: Dec 2021 Publisher: Packt ISBN-13: 9781800565005
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}