Reader small image

You're reading from  Automated Machine Learning with Microsoft Azure

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800565319
Edition1st Edition
Right arrow
Author (1)
Dennis Michael Sawyers
Dennis Michael Sawyers
author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers

Right arrow

Chapter 4: Building an AutoML Regression Solution

You've taken the first step to becoming an Azure AutoML expert by building a solution with the AutoML guided user interface. Now, it's time to level up your skills by creating a solution with the Azure Machine Learning Python Software Development Kit (AzureML Python SDK). Using the Diabetes dataset that we built in Chapter 2, Getting Started with Azure Machine Learning Service, you will build a regression solution to predict how much a person's diabetes disease has advanced over the last year.

You will begin this chapter by opening up a Jupyter notebook from your compute instance, which will let you write Python code. First, you will load in the Diabetes data. Then, you will train an AutoML model and register your trained model to your Azure Machine Learning Service (AMLS) workspace. You will accomplish this by using easily reusable Python scripts. After examining your model's results, you will learn how to register...

Technical requirements

The following are the prerequisites for this chapter:

  • Access to the internet
  • A web browser, preferably Google Chrome or Microsoft Edge Chromium
  • A Microsoft Azure account
  • An Azure Machine Learning service workspace
  • The titanic-compute-instance compute instance from Chapter 2, Getting Started with Azure Machine Learning Service
  • The compute-cluster compute cluster from Chapter 2, Getting Started with Azure Machine Learning Service
  • The Diabetes Sample dataset from Chapter 2, Getting Started with Azure Machine Learning Service

The code for this chapter is available here: https://github.com/PacktPublishing/Automated-Machine-Learning-with-Microsoft-Azure/blob/master/Chapter04/Chapter-4-AutoML-on-Azure.ipynb.

Preparing data for AutoML regression

Before you can train any model with AutoML, you must have a properly cleansed dataset. This section will walk you through how to prepare data for any AutoML regression solution. You will begin by using your compute instance to access Jupyter notebook, a code editor that will let you code in Python. Following that, you will cleanse, transform, and register your data as an Azure dataset. This will give you a dataset that's ready for training in the next section.

Some of you may be new to Python or even to coding in general, but don't worry. While scripting an AutoML solution may seem much more difficult than using the GUI, in reality, it's a matter of making slight changes to boilerplate code.

Using the code found in this book's GitHub repository, you only have to alter it slightly to adapt it to your own custom solution using your own custom data. Furthermore, for this exercise, you've already completed most of the...

Training an AutoML regression model

Compared to setting up your Jupyter environment and preparing your data, training an AutoML model involves fewer steps. First, you will need to set a name for your experiment. Remember that experiments automatically log information about your AutoML runs. Next, you will need to set your Target column, which is the column you wish to predict, and a few other settings. Finally, you will use AutoML to train a model and watch the results in real time.

In this section, you will create an experiment, configure the various parameters and settings specific to AutoML regression tasks, and train three AutoML regression models using the datasets you created in the previous section. Let's get started:

  1. Set Experiment and give it a name by using the following code. This is where all of the logs and metrics of your run will be stored in the AML studio:
    experiment_name = 'Diabetes-Sample-Regression'
    exp = Experiment(workspace=ws, name...

Registering your trained regression model

AutoML lets you easily register your trained models for future use. In Chapter 9, Implementing a Batch Scoring Solution, and Chapter 11, Implementing a Real-Time Scoring Solution, you will create batch execution inference pipelines and real-time scoring endpoints that will use your models. When registering your model, you can add tags and descriptions for easier tracking.

One especially useful feature is the ability to register models based on metrics other than the one you used to score your model. Thus, even though you trained a model using normalized RMSE, you can also register the model that had the best R2 score, even if that model is different.

In this section, you will write a simple description of your model, tag it, and give it a name. After that, you will register the model to your AMLS workspace. It also contains code that will let you register different models based on other metrics. Let's get started:

  1. First...

Fine-tuning your AutoML regression model

In this section, you will first review tips and tricks for improving your AutoML regression models and then review the algorithms used by AutoML for regression.

Improving AutoML regression models

While AutoML will handle most of the complicated data transformations and feature engineering for you, there are a few tips you can follow to increase the accuracy of your model. Some of these tips are true across all three AutoML tasks – regression, classification, and forecasting – while others are regression-specific. Following them will yield higher-performing models and, more importantly, hone your understanding of machine learning techniques. I have listed a few tips and tricks here for quick reference:

  • Fill in null values before passing them on to AutoML. Alternatively, drop any rows that contain a null value. Just because AutoML will automatically fill your null values does not mean that it will do a great job.

    In...

Summary

With this chapter, you have successfully constructed a regression model using the AzureML Python SDK. Regardless of whether you're a Python novice or expert, you have loaded data, transformed it extensively using pandas, and built a useful machine learning model with AutoML. You then registered your model to an AMLS workspace. You will use that same model in future chapters to create inference pipelines and real-time scoring endpoints using REST APIs.

By working through all the exercises in this chapter, you have obtained a level of mastery over Azure AutoML regression solutions. You can now take any set of data that's useful in predicting a number and use it to create a high-performing machine learning model. Furthermore, you can code all of this in Python and, if the model fails to perform, you know lots of little ways to improve performance, or, if worst comes to worst, change your regression problem to a classification problem.

In Chapter 5, Building an...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with Microsoft Azure
Published in: Apr 2021Publisher: PacktISBN-13: 9781800565319
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers