Reader small image

You're reading from  Automated Machine Learning with Microsoft Azure

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800565319
Edition1st Edition
Right arrow
Author (1)
Dennis Michael Sawyers
Dennis Michael Sawyers
author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers

Right arrow

Chapter 5: Building an AutoML Classification Solution

After building your AutoML regression solution with Python in Chapter 4, Building an AutoML Regression Solution, you should be feeling confident in your coding abilities. In this chapter, you will build a classification solution. Unlike regression, classification is used to predict the category of the object of interest. For example, if you're trying to predict who is likely to become a homeowner in the next five years, classification is the right machine learning approach.

Binary classification is when you are trying to predict two classes, such as homeowner or not, while multiclass classification involves trying to predict three or more classes, such as homeowner, renter, or lives with family. You can utilize both of these techniques with Azure AutoML, and this chapter will teach you how to train both kinds of models using different datasets.

In this chapter, you will begin by navigating directly to the Jupyter environment...

Technical requirements

For this chapter, you will be building models with Python code in Jupyter notebooks through Azure Machine Learning (AML) studio. Furthermore, you will be using datasets and Azure resources that you should have created in previous chapters. As such, the full list of requirements is as follows:

  • Access to the internet
  • A web browser, preferably Google Chrome or Microsoft Edge Chromium
  • A Microsoft Azure account
  • An Azure Machine Learning workspace
  • The titanic-compute-instance compute instance created in Chapter 2, Getting Started with Azure Machine Learning
  • The compute-cluster compute cluster created in Chapter 2, Getting Started with Azure Machine Learning
  • The Titanic Training Data dataset from Chapter 3, Training your First AutoML Model
  • An understanding of how to navigate to the Jupyter environment from an Azure compute instance as demonstrated in Chapter 4, Building an AutoML Regression Solution

Prepping data for AutoML classification

Classification, or predicting the category of something based on its attributes, is one of the key techniques of machine learning. Just like regression, you first need to prep your data before training it with AutoML. In this section, you will first navigate to your Jupyter notebook, load in your data, and transform it for use with AutoML.

Just as you loaded in your Diabetes Sample dataset via Jupyter notebooks for regression, you will do the same with the Titanic Training Data dataset. However, this time around you will do much more extensive data transformation before training your AutoML model. This is to build upon your learning; classification datasets do not necessarily require more transformation than their regression counterparts. Identical to the previous chapter, you will begin by opening up a Jupyter notebook from your compute instance.

Navigating to your Jupyter environment

Similar to Chapter 4, Building an AutoML Regression...

Training an AutoML classification model

Training an AutoML classification model is very similar to training an AutoML regression model, but there are a few key differences. In Chapter 4, Building an AutoML Regression Solution, you began by setting a name for your experiment. After that, you set your target column and subsequently set your AutoML configurations. Finally, you used AutoML to train a model, performed a data guardrails check, and produced results.

All of the steps in this section are nearly the same. However, pay close attention to the data guardrails check and results, as they are substantially different when training classification models:

  1. Set your experiment and give it a name:
    experiment_name = 'Titanic-Transformed-Classification'
    exp = Experiment(workspace=ws, name=experiment_name) 
  2. Set your dataset to your transformed Titanic data:
    dataset_name = "Titanic Transformed"
    dataset = Dataset.get_by_name(ws, dataset_name, version=&apos...

Registering your trained classification model

The code to register classification models is identical to the code you used in Chapter 4, Building an AutoML Regression Solution, to register your regression model. Always register new models, as you will use them to score new data using either real-time scoring endpoints or batch execution inference pipelines depending on your use case. This will be explained in Chapter 9, Implementing a Batch Scoring Solution, and Chapter 11, Implementing a Real-Time Scoring Solution. Likewise, when registering your models, always add tags and descriptions for easier tracking:

  1. First, give your model a name, a description, and some tags:
    description = 'Best AutoML Classification Run using Transformed Titanic Data.' 
    tags = {'project' : "Titanic", "creator" : "your name"} 
    model_name = 'Titanic-Transformed-Classification-AutoML' 

    Tags let you easily search for models, so think carefully...

Training an AutoML multiclass model

Multiclass classification involves predicting three or more classes instead of the standard binary classification. Using custom machine learning, training multiclass models is often a messy, complicated affair where you have to carefully consider the number of classes you are trying to predict, how unbalanced those classes are relative to each other, whether you should combine classes together, and how you should present your results. Luckily, AutoML takes care of all these considerations for you and makes training a multiclass model as simple as training a binary classification model.

In this section, you load in data using the publicly available Iris dataset. You will then set your AutoML classifications for multiclass classification, train and register a model, and examine your results. You will notice that much of the code is identical to the last section. By understanding the differences between binary and multiclass classification in AutoML...

Fine-tuning your AutoML classification model

In this section, you will first review tips and tricks for improving your AutoML classification models and then review the algorithms used by AutoML for both binary and multiclass classification.

Improving AutoML classification models

Keeping in mind the tips and tricks from Chapter 4, Building an AutoML Regression Solution, here are new ones that are specific to classification:

  • Unlike regression problems, nearly all classification problems in the real world require you to weigh your target column. The reason is that, for most business problems, one class is nearly always more important than the others.

    For example, imagine you are running a business and you are trying to predict which customers will stop doing business with you and leave you for a competitor. This is a common problem called customer churn or customer turnover. If you misidentify a customer as being likely to churn, all you waste is an unnecessary phone call...

Summary

You have added to your repertoire by successfully training a classification model using the AML Python SDK. You have loaded in data, heavily transformed it using pandas and Numpy, and built a toy AutoML model. You then registered that model to your AMLS workspace.

You can now start building classification models with your own data. You can easily solve both binary and multiclass classification problems, and you can present results to the business in a way they understand with confusion matrices. Many of the most common business problems, such as customer churn, are classification problems, and with the knowledge you learned in this chapter, you can solve those problems and earn trust and respect in your organization.

The next chapter, Chapter 6, Building an AutoML Forecasting Solution, will be vastly different from the previous two chapters. Forecasting problems have many more settings to use and understand compared to classification and regression problems, and they...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Automated Machine Learning with Microsoft Azure
Published in: Apr 2021Publisher: PacktISBN-13: 9781800565319
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dennis Michael Sawyers

Dennis Michael Sawyers is a senior cloud solutions architect (CSA) at Microsoft, specializing in data and AI. In his role as a CSA, he helps Fortune 500 companies leverage Microsoft Azure cloud technology to build top-class machine learning and AI solutions. Prior to his role at Microsoft, he was a data scientist at Ford Motor Company in Global Data Insight and Analytics (GDIA) and a researcher in anomaly detection at the highly regarded Carnegie Mellon Auton Lab. He received a master's degree in data analytics from Carnegie Mellon's Heinz College and a bachelor's degree from the University of Michigan. More than anything, Dennis is passionate about democratizing AI solutions through automated machine learning technology.
Read more about Dennis Michael Sawyers