Reader small image

You're reading from  Automated Machine Learning with Microsoft Azure

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800565319
Edition1st Edition
Right arrow
Author (1)
Dennis Michael Sawyers
Dennis Michael Sawyers
author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers

Right arrow

Chapter 9: Implementing a Batch Scoring Solution

You have trained regression, classification, and forecasting models with AutoML in Azure, and now it's time you learn how to put them in production and use them. Machine learning (ML) models, after all, are ultimately used to make predictions on new data, either in real time or in batches. In order to score new data points in batches in Azure, you must first create an ML pipeline.

An ML pipeline lets you run repeatable Python code in the Azure Machine Learning services (AMLS) that you can run on a schedule. While you can run any Python code using an ML pipeline, here you will learn how to build pipelines for scoring new data.

You will begin this chapter by writing a simple ML pipeline to score data using the multiclass classification model you trained on the Iris dataset in Chapter 5, Building an AutoML Classification Solution. Using the same data, you will then learn how to score new data points in parallel, enabling you...

Technical requirements

This chapter will feature a lot of coding using Jupyter notebooks within AMLS. Thus, you will need a working internet connection, an AMLS workspace, and a compute instance. ML pipelines also require a compute cluster. You will also need to have trained and registered the Iris multiclass classification model in Chapter 5, Building an AutoML Classification Solution.

The following are the prerequisites for the chapter:

  • Access to the internet.
  • A web browser, preferably Google Chrome or Microsoft Edge Chromium.
  • A Microsoft Azure account.
  • Have created an AMLS workspace.
  • Have created the compute-cluster compute cluster in Chapter 2, Getting Started with Azure Machine Learning Service.
  • Understand how to navigate to the Jupyter environment from an Azure compute instance as demonstrated in Chapter 4, Building an AutoML Regression Solution.
  • Have trained and registered the Iris-Multi-Classification-AutoML ML model in Chapter 5, Building...

Creating an ML pipeline

ML pipelines are Azure's solution for batch scoring ML models. You can use ML pipelines to score any model you train, including your own custom models as well as AutoML-generated models. They can only be created via code using the Azure ML Python SDK. In this section, you will code a simple pipeline to score diabetes data using the Diabetes-AllData-Regression-AutoML model you built in Chapter 4, Building an AutoML Regression Solution.

As in other chapters, you will begin by opening your compute instance and navigating to your Jupyter notebook environment. You will then create and name a new notebook. Once your notebook is created, you will build, configure, and run an ML pipeline step by step. After confirming your pipeline has run successfully, you will then publish your ML pipeline to a pipeline endpoint. Pipeline endpoints are simply URLs, web addresses that call ML pipeline runs.

The following steps deviate greatly from previous chapters. You...

Creating a parallel scoring pipeline

Standard ML pipelines work just fine for the majority of ML use cases, but when you need to score a large amount of data at once, you need a more powerful solution. That's where ParallelRunStep comes in. ParallelRunStep is Azure's answer to scoring big data in batch. When you use ParallelRunStep, you leverage all of the cores on your compute cluster simultaneously.

Say you have a compute cluster consisting of eight Standard_DS3_v2 virtual machines. Each Standard_DS3_v2 node has four cores, so you can perform 32 parallel scoring processes at once. This parallelization essentially lets you score data many times faster than if you used a single machine. Furthermore, it can easily scale vertically (increasing the size of each virtual machine in the cluster) and horizontally (increasing the node count).

This section will allow you to become a big data scientist who can score large batches of data. Here, you will again be using simulated...

Creating an AutoML training pipeline

Sometimes, it's necessary to retrain a model that you trained in AutoML. ML models can degrade over time if the relationship between your data and your target variable changes. This is true for all ML models, not just ones generated by AutoML.

Imagine, for example, that you build an ML model to predict demand for frozen pizza at a supermarket, and then one day, a famous pizza chain sets up shop next door. It's very likely that consumer buying behavior will change, and you will need to retrain the model. This is true for all ML models.

Luckily, AMLS has specialized ML pipeline steps built specifically for retraining models. In this section, we are going to use one of those steps, the AutoML step. The AutoML step lets you retrain models easily whenever you want, either with a push of a button or on a schedule.

Here, you will build a two-step ML pipeline where you will first train a model with an AutoML step and register it with...

Triggering and scheduling your ML pipelines

One of the biggest problems data scientists face is creating easy, rerunnable, production-ready code and scheduling it in an automatic, reliable manner. You've already accomplished the first part by creating your three ML pipelines. Now, it's time to learn how to do the second part.

In this section, you will first learn how to manually trigger the pipelines you've created through the GUI. Then, you will learn how to trigger the pipelines via code, both manually and on an automated schedule. This will enable you to put your ML pipelines into production, generating results on an hourly, daily, weekly, or monthly basis.

Triggering your published pipeline from the GUI

Triggering your published pipeline from the AML studio GUI is easy. However, you cannot set up an automated schedule for your ML pipelines at this time. As such, it is most useful for triggering training pipelines when you notice that your results seem off...

Summary

You have now implemented a fully automated ML batch scoring solution using an AutoML trained model. You've created pipelines that can score models, pipelines that can process big data in parallel, and pipelines that can retrain AutoML models. You can trigger them whenever you want and you can even set up an automated scoring schedule. This is no small feat, as many organizations have spent years trying to learn best practices for these tasks.

In Chapter 10, Creating End-to-End AutoML Solutions, you will cement your knowledge as you learn how to ingest data into Azure, score it with ML pipelines, and write your results to whatever location you want.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with Microsoft Azure
Published in: Apr 2021Publisher: PacktISBN-13: 9781800565319
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers