You're reading from Azure Data Scientist Associate Certification Guide

Product typeBook

Published inDec 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800565005

Edition1st Edition

Languages

Python

Tools

Azure Functions

Concepts

Machine Learning

Authors (2):

Andreas Botsikas

Michael Hlobil

View More author details

Chapter 11: Working with Pipelines

In this chapter, you will learn how you can author repeatable processes, defining pipelines that consist of multiple steps. You can use these pipelines to author training pipelines that transform your data and then train models, or you can use them to perform batch inferences using pre-trained models. Once you register one of those pipelines, you can invoke it using either an HTTP endpoint or through the SDK, or even configure them to execute on a schedule. With this knowledge, you will be able to implement and consume pipelines by using the Azure Machine Learning (AzureML) SDK.

In this chapter, we are going to cover the following main topics:

Understanding AzureML pipelines
Authoring a pipeline
Publishing a pipeline to expose it as an endpoint
Scheduling a recurring pipeline

Technical requirements

You will need to have access to an Azure subscription. Within that subscription, you will need a resource group named packt-azureml-rg. You will need to have either a Contributor or Owner Access control (IAM) role on the resource group level. Within that resource group, you should have already deployed a machine learning resource named packt-learning-mlw, as described in Chapter 2, Deploying Azure Machine Learning Workspace Resources.

You will also need to have a basic understanding of the Python language. The code snippets target Python version 3.6 or newer. You should also be familiar with working in the notebook experience within AzureML studio, something that was covered in Chapter 8, Experimenting with Python Code.

This chapter assumes you have registered the loans dataset you generated in Chapter 10, Understanding Model Results. It is also assumed that you have created a compute cluster named cpu-sm-cluster, as described in the Working with compute...

Understanding AzureML pipelines

In Chapter 6, Visual Model Training and Publishing, you saw how you can design a training process using building boxes. Similar to those workflows, the AzureML SDK allows you to author Pipelines that orchestrate multiple steps. For example, in this chapter, you will author a Pipeline that consists of two steps. The first step pre-processes the loans dataset that is regarded as raw training data and stores it in a temporary location. The second step then reads this data and trains a machine learning model, which will be stored in a blob store location. In this example, each step will be nothing more than a Python script file that is being executed in a specific compute target using a predefined Environment.

Important note

Do not confuse the AzureML Pipelines with the sklearn Pipelines you read in Chapter 10, Understanding Model Results. The sklearn ones allow you to chain various transformations and feature engineering methods to transform the data...

Authoring a pipeline

Let's assume that you need to create a repeatable workflow that has two steps:

It loads the data from a registered dataset and splits it into training and test datasets. These datasets are converted into a special construct needed by the LightGBM tree-based algorithm. The converted constructs are stored to be used by the next step. In our case, you will use the loans dataset that you registered in Chapter 10, Understanding Model Results. You will be writing the code for this step within a folder named step01.
It loads the pre-processed data and trains a LightGBM model that is then stored in the /models/loans/ folder of the default datastore attached to the AzureML workspace. You will be writing the code for this step within a folder named step02.
Each step will be a separate Python file, taking some arguments to specify where to read the data from and where to write the data to. These scripts will utilize the same mechanics as the scripts you authored...

Publishing a pipeline to expose it as an endpoint

So far, you have defined a pipeline using the AzureML SDK. If you had to restart the kernel of your Jupyter notebook, you would lose the reference to the pipeline you defined, and you would have to rerun all the cells to recreate the pipeline object. The AzureML SDK allows you to publish a pipeline that effectively registers it as a versioned object within the workspace. Once a pipeline is published, it can be submitted without the Python code that constructed it.

In a new cell in your notebook, add the following code:

published_pipeline = pipeline.publish(
    "Loans training pipeline", 
    description="A pipeline to train a LightGBM model")

This code publishes the pipeline and returns a PublishedPipeline object, the versioned object registered within the workspace. The most interesting attribute of that object is the endpoint, which returns the REST endpoint URL...

Scheduling a recurring pipeline

Being able to invoke a pipeline through the published REST endpoint is great when you have third-party systems that need to invoke a training process after a specific event has occurred. For example, suppose you are using Azure Data Factory to copy data from your on-premises databases. You could use the Machine Learning Execute Pipeline activity and trigger a published pipeline, as shown in Figure 11.9:

Figure 11.9 – Sample Azure Data Factory pipeline triggering an AzureML published pipeline following a copy activity

If you wanted to schedule the pipeline to be triggered monthly, you would need to publish the pipeline as you did in the previous section, get the published pipeline ID, create a ScheduleRecurrence, and then create the Schedule. Return to your notebook where you already have a reference to published_pipeline. Add a new cell with the following code:

from azureml.pipeline.core.schedule import ScheduleRecurrence...

Summary

In this chapter, you learned how you can define AzureML pipelines using the AzureML SDK. These pipelines allow you to orchestrate various steps in a repeatable manner. You started by defining a training pipeline consisting of two steps. You then learned how to trigger the pipeline and how to troubleshoot potential code issues. Then you published the pipeline to register it within the AzureML workspace and acquire an HTTP endpoint that third-party software systems could use to trigger pipeline executions. In the last section, you learned how to schedule the recurrence of a published pipeline.

In the next chapter, you will learn how to operationalize the models you have been training so far in the book. Within that context, you will use the knowledge you acquired in this chapter to author batch inference pipelines, something that you can publish and trigger with HTTP or have it scheduled, as you learned in this chapter.

Questions

In each chapter, you will find a couple of questions to validate your understanding of the topics discussed in this chapter.

What affects the execution order of the pipeline steps?
a. The order in which the steps were defined when constructing the Pipeline object.
b. The data dependencies between the steps.
c. All steps execute in parallel, and you cannot affect the execution order.
True or false: All steps within a pipeline need to execute within the same compute target and Environment.
True or false: PythonScriptStep, by default, reuses the previous execution results if nothing has changed in the parameters or the code files.
You are trying to debug a child run execution issue. Which of the following methods should you call in the StepRun object?
a. get_file_names
b. get_details_with_logs
c. get_metrics
d. get_details
You have just defined a pipeline in Python code. What steps do you need to make to schedule a daily execution of that pipeline?

...

Documentation regarding the LightGBM framework used in this chapter: https://lightgbm.readthedocs.io
HTTP request methods:
Requests Python library for making HTTP requests: https://docs.Python-requests.org
Executing an AzureML pipeline through Azure Data Factory: https://docs.microsoft.com/en-us/azure/data-factory/transform-data-machine-learning-service
The click Python library for script parameter parsing and the creation of Command-Line Interface (CLI) applications: https://click.palletsprojects.com/

The rest of the chapter is locked

You have been reading a chapter from

Azure Data Scientist Associate Certification Guide

Published in: Dec 2021Publisher: PacktISBN-13: 9781800565005

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Andreas Botsikas

Andreas Botsikas is an experienced advisor working in the software industry. He has worked in the finance sector, leading highly efficient DevOps teams, and architecting and building high-volume transactional systems. He then traveled the world, building AI-infused solutions with a group of engineers and data scientists. Currently, he works as a trusted advisor for customers onboarding into Azure, de-risking and accelerating their cloud journey. He is a strong engineering professional with a Doctor of Philosophy (Ph.D.) in resource optimization with artificial intelligence from the National Technical University of Athens.
Read more about Andreas Botsikas

Michael Hlobil

Michael Hlobil is an experienced architect focused on quickly understanding customers' business needs, with over 25 years of experience in IT pitfalls and successful projects, and is dedicated to creating solutions based on the Microsoft Platform. He has an MBA in Computer Science and Economics (from the Technical University and the University of Vienna) and an MSc (from the ESBA) in Systemic Coaching. He was working on advanced analytics projects in the last decade, including massive parallel systems and Machine Learning systems. He enjoys working with customers and supporting the journey to the cloud.
Read more about Michael Hlobil

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Azure Data Scientist Associate Certification Guide

Chapter 11: Working with Pipelines

Technical requirements

Understanding AzureML pipelines

Authoring a pipeline

Publishing a pipeline to expose it as an endpoint

Scheduling a recurring pipeline

Summary

Questions

Further reading

Unlock this book and the full library FREE for 7 days

Authors (2)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook