Reader small image

You're reading from  Mastering Azure Machine Learning

Product typeBook
Published inApr 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781789807554
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Christoph Körner
Christoph Körner
author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

Kaijisse Waaijer
Kaijisse Waaijer
author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer

View More author details
Right arrow

5. Azure Machine Learning pipelines

In the previous chapters, we learned about many extract, transform, and load (ETL) preprocessing and feature-engineering approaches within the Azure Machine Learning using Dataset, Datastore, and DataPrep. In this chapter, you will learn how to use these transformation techniques to build reusable machine learning (ML) pipelines.

First, you will learn about the benefits of splitting your code into individual steps and wrapping them into a pipeline. Not only can you make your code blocks reusable through modularization and parameters, but you can also control the compute targets for individual steps. This helps to optimally scale your computations, save costs, and improve performance at the same time. Lastly, you can parameterize and trigger your pipelines through an HTTP endpoint or through a recurring or reactive schedule.

After that, we'll build a complex Azure Machine Learning pipeline in a couple of steps. We start with a simple...

Benefits of pipelines for ML workflows

Separating your workflow into reusable configurable steps and combining these steps to form an end-to-end pipeline provides many benefits for implementing end-to-end ML processes. Multiple teams can own and iterate on individual steps to improve the pipeline over time, while others can easily integrate each version of the pipeline into their current setup.

The pipeline itself doesn't only split code from execution—it also splits the execution from the orchestration. Hence, you can configure individual compute targets that can be used to optimize your execution and provide parallel execution, during which you don't have to touch the ML code.

We will take a quick look into Azure Machine Learning pipelines and why they should be your tool of choice when implementing ML workflows in Azure. In the following section, Building and publishing an ML pipeline, we will dive a lot deeper and explore the individual features by building...

Building and publishing an ML pipeline

Let's go ahead and use our knowledge from the previous chapters to build a pipeline for data processing. We will use the Azure Machine Learning Python SDK to define all pipeline steps as Python code so the pipeline can be easily managed, reviewed, and checked into version control as an authoring script.

We will define a pipeline as a linear sequence of steps. Each step will have an input and output defined as pipeline data sinks and sources. Each step will be associated with a compute target that defines both the execution environment and the compute resource for execution. We will set up an execution environment as a Docker container with all the required Python libraries and run the pipeline steps on a training cluster in Azure Machine Learning.

A pipeline runs as an experiment in your Azure Machine Learning workspace. We can either submit the pipeline as part of the authoring script, deploy it as web service and trigger it through...

Integrating pipelines with other Azure services

It's rare that users use only a single service to manage data flows, experimentation, training, deployment, and CI/CD in the cloud. Other services provide specific benefits that make them a better fit for certain tasks, such as Azure Data Factory for loading data into Azure, as well as Azure Pipelines for CI/CD and running automated tasks in Azure DevOps.

The strongest argument for betting on a cloud provider is strong integration with the individual services. In this section, we will see how Azure Machine Learning pipelines integrate with other Azure services. The list for this section would be a lot longer if we were to cover every possible service for integration. As we learned in this chapter, you can trigger a published pipeline by calling a REST endpoint, and you can submit a pipeline using standard Python code. This means you can integrate pipelines anywhere where you can call HTTP endpoints or run Python code.

We will...

Summary

In this chapter, you have learned how to use and configure Azure Machine Learning pipelines to split an ML workflow into multiple steps, and how to use pipelines and pipeline steps for estimators, Python execution, and parallel execution. You configured pipeline inputs and outputs using Dataset and PipelineData and managed to control the execution flow of a pipeline.

As another milestone, you deployed the pipeline as a PublishedPipeline instance to an HTTP endpoint. This lets you configure and trigger pipeline execution with a simple HTTP call. After that, you implemented automatic scheduling based on time frequency, and you used reactive scheduling based on changes in the underlying dataset. Now the pipeline can rerun your workflow when the input data changes without any manual interaction.

Finally, we also modularized and versioned a pipeline step, so it can be reused in other projects. We used InputPortDef and OutputPortDef to create virtual bindings for data sources...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Azure Machine Learning
Published in: Apr 2020Publisher: PacktISBN-13: 9781789807554
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer