Reader small image

You're reading from  Azure Data Scientist Associate Certification Guide

Product typeBook
Published inDec 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800565005
Edition1st Edition
Languages
Right arrow
Authors (2):
Andreas Botsikas
Andreas Botsikas
author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

Michael Hlobil
Michael Hlobil
author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil

View More author details
Right arrow

Chapter 9: Optimizing the ML Model

In this chapter, you will learn about two techniques you can use to discover the optimal model for your dataset. You will start by exploring the HyperDrive package of the AzureML SDK. This package allows you to fine-tune the model's performance by tweaking the parameters it exposes, a process also known as hyperparameter tuning. You will then explore the Automated ML (AutoML) package of the AzureML SDK, which allows you to automate the model selection, training, and optimization process through code.

In this chapter, we are going to cover the following main topics:

  • Hyperparameter tuning using HyperDrive
  • Running AutoML experiments with code

Technical requirements

You will need to have access to an Azure subscription. Within that subscription, you will need a resource group named packt-azureml-rg. You will need to have either a Contributor or Owner Access control (IAM) role on the resource group level. Within that resource group, you should have already deployed a machine learning resource named packt-learning-mlw, as described in Chapter 2, Deploying Azure Machine Learning Workspace Resources.

You will also need to have a basic understanding of the Python language. The code snippets target Python version 3.6 or newer. You should also be familiar with working in the notebook experience within AzureML Studio, something that was covered in Chapter 8, Experimenting with Python Code.

This chapter assumes you have registered the scikit-learn diabetes dataset in your AzureML workspace and that you have created a compute cluster named cpu-sm-cluster, as described in the sections Defining datastores, Working with datasets...

Hyperparameter tuning using HyperDrive

In Chapter 8, Experimenting with Python Code, you trained a LassoLars model that was accepting the alpha parameter. In order to avoid overfitting to the training dataset, the LassoLars model uses a technique called regularization, which basically introduces a penalty term within the optimization formula of the model. You can think of this technique as if the linear regression that we are trying to fit consists of a normal linear function that is being fitted with the least-squares function plus this penalty term. The alpha parameter specifies how important this penalty term is, something that directly impacts the training outcome. Parameters that affect the training process are referred to as being hyperparameters. To understand better what a hyperparameter is, we are going to explore the hyperparameters of a decision tree. In a decision tree classifier model, like the DecisionTreeClassifier class located in the scikit-learn library, you can define...

Running AutoML experiments with code

So far, in this chapter, you were fine-tuning a LassoLars model, performing a hyperparameter tuning process to identify the best value for the alpha parameter based on the training data. In this section, you will use AutoML in the AzureML SDK to automatically select the best combination of data preprocessing, model, and hyperparameter settings for your training dataset.

To configure an AutoML experiment through the AzureML SDK, you will need to configure an AutoMLConfig object. You will need to define the Task type, the Metric, the Training data, and the Compute budget you want to invest. The output of this process is a list of models from which you can select the best run and the best model associated with that run, as shown in Figure 9.11:

Figure 9.11 – AutoML process

Depending on the type of problem you are trying to model, you must select the task parameter, selecting either classification, regression, or...

Summary

In this chapter, you explored the most-used approaches in optimizing a specific model to perform well against a dataset and how you can even automate the process of model selection. You started by performing parallelized hyperparameter tuning using the HyperDriveConfig class to optimize the alpha parameter of the LassoLars model you have been training against the diabetes dataset. Then, you automated the model selection, using AutoML to detect the best combination of algorithms and parameters that predicts the target column of the diabetes dataset.

In the next chapter, you will build on top of this knowledge, learning how to use the AzureML SDK to interpret the model results.

Questions

  1. You want to get the best model trained by an AutoML run. Which code is correct?

    a. model = run.get_output()[0]

    b. model = run.get_output()[1]

    c. model = run.get_outputs()[0]

    d. model = run.get_outputs()[1]

  2. You want to run a forecasting AutoML experiment on top of data you receive from a sensor. You receive one record every day from the sensor. You want to be able to predict the values for 5 days. Which of the following parameters should you pass to the ForecastingParameters class?

    a. forecast_horizon = 5 * 1

    b. forecast_horizon = 5 * 24

    c. forecast_horizon = 5 * 12

Further reading

This section offers a list of helpful web resources that will help you augment your knowledge of the AzureML SDK and the various code snippets used in this chapter:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Azure Data Scientist Associate Certification Guide
Published in: Dec 2021Publisher: PacktISBN-13: 9781800565005
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

author image
Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil