Reader small image

You're reading from  Automated Machine Learning with Microsoft Azure

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800565319
Edition1st Edition
Right arrow
Author (1)
Dennis Michael Sawyers
Dennis Michael Sawyers
author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers

Right arrow

Chapter 10: Creating End-to-End AutoML Solutions

Now that you have created machine learning (ML) pipelines, you can learn how to use them in other Azure products outside of the Azure Machine Learning Service (AMLS). Perhaps the most useful is Azure Data Factory.

Azure Data Factory (ADF) is Azure's premier code-free data orchestration tool. You can use ADF to pull data from on-premise sources into the Azure cloud, to run ML pipelines, and push data out of Azure by creating an Azure Data Factory pipeline (ADF pipeline). ADF pipelines are an integral part of creating end-to-end ML solutions and are the end goal of any non-real-time AutoML project.

You will begin this chapter by learning how to connect AMLS to ADF. Once you have accomplished this task, you will learn how to schedule an ML pipeline using the parallel pipeline you created in Chapter 9, Implementing a Batch Scoring Solution.

Next, you will learn how to pull data from your local machine and load it into the...

Technical requirements

In this chapter, you will create an ADF resource and use the ML pipeline objects you created in Chapter 9, Implementing a Batch Scoring Solution. As such, you will need a working internet connection, an Azure account, and access to your AMLS workspace.

With your Azure account, you will also need permissions to create a service principal in Azure Active Directory. If you're using a personal Azure account, you should have this access. If you're using a work account, speak with your Azure administrator for this level of permission.

The following are the prerequisites for the chapter:

  • Have access to the internet
  • Have a web browser, preferably Google Chrome or Microsoft Edge Chromium
  • Have a Microsoft Azure account
  • Have created an AMLS workspace
  • Have created the compute-cluster compute cluster in Chapter 2, Getting Started with Azure Machine Learning Service
  • Understand how to navigate to the Jupyter environment from an...

Connecting AMLS to ADF

ADF is a code-free data orchestration and transformation tool. With it, you can create ADF pipelines that can copy data into Azure, transform data, run ML pipelines, and push data back onto certain on-premise databases and file shares. It's incredibly easy to make and schedule ADF pipelines using ADF's code-free pipeline editing tool. As you create an ADF pipeline with the drag and drop interface, you're actually writing JSON code, which ADF uses to execute jobs.

Tip

Azure Synapse Analytics, Microsoft Azure's premier data warehousing and integrated analytics service, also has a feature nearly identical to ADF pipelines: Azure Synapse pipelines. Anything that you do in this chapter with ADF pipelines you can also achieve with Azure Synapse pipelines using a very similar interface.

In this section, you will create an ADF resource and connect it to AMLS. You will do this using a linked service, an object similar to a connection string...

Scheduling a machine learning pipeline in ADF

Perhaps ADF's best feature is its ease of use. By clicking and dragging objects across a screen, you can easily orchestrate a flow of seamless data ingestion, transformation, and ML through an ADF pipeline. Moreover, with a few more clicks, you can schedule that ADF pipeline to run whenever you want. Gaining this skill will enable you to create code-free data orchestration runs quickly and easily.

First, you will schedule and run the simplest ML pipeline you created in Chapter 9, Implementing a Batch Scoring Solution, the Iris-Scoring-Pipeline. To do so, follow these steps:

  1. Navigate to your ADF resource and click Author & Monitor.
  2. Click the pen icon on the left-hand side. When you hover over this icon, the word Author will appear to indicate which section you're navigating to.
  3. Click the blue cross icon next to the search box under Factory Resources in the top-left corner. When you hover over this icon,...

Transferring data using ADF

Moving data from on-premise to the cloud and from the cloud to on-premise is a key skill for any data engineer or data scientist. ADF accomplishes this task with the Copy data activity. This is ADF's most basic and most powerful function.

In this section, first, you will download a self-hosted integration runtime (SHIR) to your local machine, allowing your computer to serve as a compute resource to load data into Azure. Then, you will create a linked service for your Azure storage account and your local PC.

Next, you will download a file from the GitHub repository and save it to your PC. Finally, you will create a Copy data activity in ADF that will take data from your PC and put it into the same Azure blob container that's connected to your AML datastore.

Going through these exercises will give you the data engineering skills that will allow you to create an end-to-end solution in the next section.

Installing a self-hosted integration...

Automating an end-to-end scoring solution

Ultimately, the end goal of any AutoML project is to create an automated scoring solution. Data gets pulled in from a source, scored automatically using the model you trained, and the results get stored in a location of your choice. By combining everything you've learned in the previous three sections, you can accomplish this task easily.

You will begin this section by opening up AMLS, creating a new dataset, and slightly altering your existing Iris-Scoring-Pipeline. Then, after republishing your pipeline with a new name, you will combine it with the Copy data activity you created to load data into Azure.

Next, you will create another Copy Data activity to transfer your results from Azure to your PC and schedule the job to run once a week on Mondays. This is a very common pattern in ML, and it's one you can accomplish without any code at all using ADF.

Editing an ML pipeline to score new data

First, you need to create...

Automating an end-to-end training solution

Like any other ML model, once an AutoML model is deployed and runs for a few months, it can benefit from being retrained. There are many reasons for this, in order of importance:

  • ML models break if the pattern between your input data and target column changes. This often happens due to extraneous factors such as changes in consumer behavior. When the pattern breaks, you need to retrain your model to retain performance.
  • ML models perform better the more relevant data you feed them. Therefore, as your data grows, you should periodically retrain models.
  • Retraining models on a consistent basis means that they're less likely to break if patterns change slowly over time. Consequently, it's best practice to retrain as data is acquired.

In this section, you are going to put your skills to the test. You will be given a set of instructions similar to when you created an end-to-end scoring solution. However, this time...

Summary

Automating ML solutions in an end-to-end fashion is no easy task and if you've made it this far, feel proud. Most modern data science organizations can easily train models. Very few can implement reliable, automated, end-to-end solutions as you have done in this chapter.

You should now feel confident in your ability to design end-to-end AutoML solutions. You can train models with AutoML and create ML pipelines to score data and retrain models. You can easily ingest data into Azure and transfer it out of Azure with ADF. Furthermore, you can tie everything together and create ADF pipelines that seamlessly ingest data, score data, train data, and push results to wherever you'd like. You can now create end-to-end ML solutions.

Chapter 11, Implementing a Real-Time Scoring Solution, will cement your ML knowledge by teaching you how to score data in real time using Azure Kubernetes Service within AMLS. Adding real-time scoring to your batch-scoring skillset will make...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with Microsoft Azure
Published in: Apr 2021Publisher: PacktISBN-13: 9781800565319
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers