You're reading from Engineering MLOps

Product typeBook

Published inApr 2021

PublisherPackt

ISBN-139781800562882

Edition1st Edition

Tools

Azure Functions

Concepts

Machine Learning

Author (1)

Emmanuel Raj

Chapter 4: Machine Learning Pipelines

In this chapter, we will explore and implement machine learning (ML) pipelines by going through hands-on examples using the MLOps approach. We will learn more by solving the business problem that we've been working on in Chapter 3, Code Meets Data. This theoretical and practical approach to learning will ensure that you will have comprehensive knowledge of architecting and implementing ML pipelines for your problems or your company's problems. A ML pipeline has modular scripts or code that perform all the traditional steps in ML, such as data preprocessing, feature engineering, and feature scaling before training or retraining any model.

We begin this chapter by ingesting the preprocessed data we worked on in the last chapter by performing feature engineering and scaling it to get it in shape for the ML training. We will discover the principles of ML pipelines and implement them on the business problem. Going ahead, we'll look...

Going through the basics of ML pipelines

Before we jump into the implementation of the ML pipeline, let's get the basics right. We will reflect on ML pipelines and set up the needed resources for ML pipeline implementation and then we will get started with data ingestion. Let's demystify ML pipelines by reflecting on the ML pipeline we discussed in Figure 14 of Chapter 1, Fundamentals of MLOps Workflow.

Figure 4.1 – Machine learning pipeline

As shown in Figure 4.1, a comprehensive ML pipeline consists of the following steps:

Data ingestion
Model training
Model testing
Model packaging
Model registering

We will implement all these steps of the pipeline using the Azure ML service (cloud-based) and MLflow (open source) simultaneously for the sake of a diverse perspective. Azure ML and MLflow are a power couple for MLOps: they exhibit the features shown in Table 4.1. They are also unique in their capabilities...

Data ingestion and feature engineering

Data is essential to train ML models; without data, there is no ML. Data ingestion is a trigger step for the ML pipeline. It deals with the volume, velocity, veracity, and variety of data by extracting data from various data sources and ingesting the needed data for model training.

The ML pipeline is initiated by ingesting the right data for training the ML models. We will start by accessing the preprocessed data we registered in the previous chapter. Follow these steps to access and import the preprocessed data and get it ready for ML training:

Using the Workspace() function from the Azure ML SDK, access the data from the datastore in the ML workspace as follows:

from azureml.core import Workspace, Dataset
subscription_id = 'xxxxxx-xxxxxx-xxxxxxx-xxxxxxx'
resource_group = 'Learn_MLOps'
workspace_name = 'MLOps_WS'
workspace = Workspace(subscription_id, resource_group, workspace_name)

Note

Insert your own...

Machine learning training and hyperparameter optimization

We are all set to do the fun part, training ML models! This step enables model training; it has modular scripts or code that perform all the traditional steps in ML training, such as fitting and transforming data to train the model and hyperparameter tuning to converge the best model. The output of this step is a trained ML model.

To solve the business problem, we will train two well-known models using the Support Vector Machine classifier and the Random Forest classifier. These are chosen based on their popularity and consistency of results; you are free to choose models of your choice – there are no limitations in this step. First, we will train the Support Vector Machine classifier and then the Random Forest classifier.

Support Vector Machine

Support Vector Machine (SVM) is a popular supervised learning algorithm (used for classification and regression). The data points are classified using hyperplanes in...

Model testing and defining metrics

In this step, we evaluate the trained model performance on a separate set of data points, named test data (which was split and versioned earlier, in the data ingestion step). The inference of the trained model is evaluated according to the selected metrics as per the use case. The output of this step is a report on the trained model performance.

To gain a comprehensive analysis of the model performance, we will measure the accuracy, precision, recall, and f-score. This is what they mean in practice in the context of the business problem:

Accuracy: Number of correct predictions by the total number of predictions of data test samples.
Precision: Precision measures the proportion of positives that were correctly predicted as positive. Precision = True Positives / (True Positives + False Positives)
Recall: Recall measures the proportion of actual positives that were identified correctly. Recall = True Positives / (True Positives...

Model packaging

After the trained model has been tested in the previous step, the model can be serialized into a file to be exported to the test or the production environment. Serialized files come with compatibility challenges, such as model interoperability, if not done right. Model interoperability is a challenge, especially when models are trained using different frameworks. For example, if model 1 is trained using sklearn and model 2 is trained using TensorFlow, then model 1 cannot be imported or exported using TensorFlow for further model fine-tuning or model inference.

To avoid this problem, ONNX offers an open standard for model interoperability. ONNX stands for Open Neural Network Exchange. It provides a serialization standard for importing and exporting models. We will use the ONNX format to serialize the models to avoid compatibility and interoperability issues.

Using ONNX, the trained model is serialized using the skl2onnx library. The model is serialized as the...

Registering models and production artifacts

In this step, the model that has been serialized or containerized in the previous step is registered and stored in the model registry. A registered model is compiled as a logical container for one or more files that function as a model. For instance, a model made up of multiple files can be registered as a single model in the model registry. By downloading the registered model, all the files can be received. The registered model can be deployed and used for inference on demand.

Let's register our serialized models in the previous section by using the model .register() function from the Azure ML SDK. By using this function, the serialized ONNX file is registered to the workspace for further use and deploying to the test and production environment. Let's register the serialized SVM classifier model (svc.onnx):

# Register Model on AzureML WS
model = Model.register (model_path = './outputs/svc.onnx', # this points to...

Summary

In this chapter, we went through the theory of ML pipelines and practiced them by building ML pipelines for a business problem. We set up tools, resources, and the development environment for training these ML models. We started with the data ingestion step, followed by the model training step, testing step, and packaging step, and finally, we completed the registering step. Congrats! So far, you have implemented a critical building block of the MLOps workflow.

In the next chapter, we will look into evaluating and packaging production models.

The rest of the chapter is locked

You have been reading a chapter from

Engineering MLOps

Published in: Apr 2021Publisher: PacktISBN-13: 9781800562882

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Emmanuel Raj

Emmanuel Raj is a Finland-based Senior Machine Learning Engineer with 6+ years of industry experience. He is also a Machine Learning Engineer at TietoEvry and a Member of the European AI Alliance at the European Commission. He is passionate about democratizing AI and bringing research and academia to industry. He holds a Master of Engineering degree in Big Data Analytics from Arcada University of Applied Sciences. He has a keen interest in R&D in technologies such as Edge AI, Blockchain, NLP, MLOps and Robotics. He believes "the best way to learn is to teach", he is passionate about sharing and learning new technologies with others.
Read more about Emmanuel Raj

Other recommended products

Related to this chapter

Automated Machine Learning with Microsoft Azure

A practical, step-by-step guide to using Microsoft's AutoML technology on the Azure Machine Learning service for developers and data scientists working with the Python programming language

BookApr 2021340 pages

Amazon SageMaker Best Practices

Going beyond the basics, Amazon SageMaker Best Practices provides end-to-end coverage of the service capabilities that the platform offers for building and automating machine learning workloads to address data science challenges. With this book, you'll discover tips to train, deploy, and monitor your machine learning solutions efficiently.

BookSep 2021348 pages

Machine Learning Engineering with MLflow

Machine Learning Engineering with MLflow is a step-by-step guide that will have you up and running, and productive in no time with MLflow using the most effective machine learning engineering approach. You will also learn how to scale MLflow in big data environments and for high computing demands.

BookAug 2021248 pages2

Mastering Azure Machine Learning

This book will help you learn how to build a scalable end-to-end machine learning pipeline in Azure from experimentation and training to optimization and deployment. By the end of this book, you will learn to build complex distributed systems and scalable cloud infrastructure using powerful machine learning algorithms to compute insights.

BookApr 2020436 pages

Automated Machine Learning

This guide will help you to explore automated machine learning (AutoML), a rapidly growing subfield of machine learning. You’ll learn how you can use AutoML to fully automate the machine learning process even if you’re not an expert, and in turn increase your productivity drastically.

BookFeb 2021312 pages

Hands-On Automated Machine Learning

This book helps machine learning professionals in developing AutoML systems that can be utilized to build ML solutions. This book covers the necessary foundations and shows the most practical ways possible to get to speed with regards to creating AutoML modules.

BookApr 2018282 pages

Machine Learning with Go Quick Start Guide

Machine learning has become an essential part of the modern data-driven world and has been extensively adopted in various fields across financial forecasting, effective searches, robotics, digital imaging in healthcare, and more. This book will teach you to perform various machine learning tasks using Go in different environments.

BookMay 2019168 pages

Hands-On Machine Learning with Azure

This book will teach you how advanced machine learning can be performed in the cloud in a very cheap way. You will learn more about Azure ML processes as an enterprise-ready methodology. By the end of this book, you will implement machine learning and artificial intelligence concepts in your model to solve real-world problems.

BookOct 2018340 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages