Reader small image

You're reading from  Mastering Azure Machine Learning

Product typeBook
Published inApr 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781789807554
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Christoph Körner
Christoph Körner
author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

Kaijisse Waaijer
Kaijisse Waaijer
author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer

View More author details
Right arrow

10. Distributed machine learning on Azure

In the previous chapter, we learned about hyperparameter tuning, through search and optimization using HyperDrive as well as Azure Automated Machine Learning, as a special case of hyperparameter optimization, involving feature engineering, model selection, and model stacking. Automated machine learning is machine learning as a service (MLaaS) where the only input is your data, a ML task, and an error metric. It's hard to imagine running all the experiments and parameter combinations for Azure Automated Machine Learning on a single machine or a single CPU/GPU—we are looking into ways to speed up the training process through parallelization and distributed computing.

In this chapter, we will take a look into distributed and parallel computing algorithms and frameworks for efficiently training ML models in parallel. The goal of this chapter is to build an environment in Azure where you can speed up the training process of...

Exploring methods for distributed ML

The journey of implementing ML pipelines is very similar for a lot of users, and is often similar to the steps described in the previous chapters. When users start switching from experimentation to real-world data or from small examples to larger models, they often experience a similar issue: training large parametric models on large amounts of data—especially DL models—takes a very long time. Sometimes, epochs last hours and training takes days to converge.

Waiting hours or even days for a model to converge means precious time wasted for many engineers, as it makes it a lot harder to interactively tune the training process. Therefore, many ML engineers need to speed up their training process by leveraging various distributed computing techniques. The idea of distributed ML is as simple as speeding up a training process by adding more compute resources. In the best case, the training performance improves linearly by adding more...

Using distributed ML in Azure

The Exploring methods for distributed ML section contained an overwhelming amount of different parallelization scenarios, various communication backends for collective algorithms, and code examples using different ML frameworks and even execution engines. The amount of choice when it comes to ML frameworks is quite large and making an educated decision is not easy. This choice gets even more complicated when some frameworks are supported out of the box in Azure Machine Learning while others have to be installed, configured, and managed by the user.

In this section, we will go through the most common scenarios, learn how to choose the correct combination of frameworks, and implement a distributed ML pipeline in Azure.

In general, you have three choices for running distributed ML in Azure:

  • The first obvious choice is using Azure Machine Learning, the Notebook environment, the Azure Machine Learning SDK, and Azure Machine Learning compute clusters...

Summary

Distributed ML is a great approach to scaling out your training infrastructure in order to gain speed in your training process. It is applied in many real-world scenarios and is very easy to use with Horovod and Azure Machine Learning.

Parallel execution is similar to hyperparameter search, while distributed execution is similar to Bayesian optimization, which we discussed in detail in the previous chapter. Distributed executions need methods to perform communication (such as one-to-one, one- to-many, many-to-one, and many-to-many) and synchronization (such as barrier synchronization) efficiently. These so-called collective algorithms are provided by communication backends (MPI, Gloo, and NCCL) and allow efficient GPU-to-GPU communication.

DL frameworks build higher-level abstractions on top of communication backends to perform model-parallel and data-parallel training. In data-parallel training, we partition the input data to compute multiple independent parts of the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Azure Machine Learning
Published in: Apr 2020Publisher: PacktISBN-13: 9781789807554
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer