Reader small image

You're reading from  Engineering MLOps

Product typeBook
Published inApr 2021
PublisherPackt
ISBN-139781800562882
Edition1st Edition
Right arrow
Author (1)
Emmanuel Raj
Emmanuel Raj
author image
Emmanuel Raj

Emmanuel Raj is a Finland-based Senior Machine Learning Engineer with 6+ years of industry experience. He is also a Machine Learning Engineer at TietoEvry and a Member of the European AI Alliance at the European Commission. He is passionate about democratizing AI and bringing research and academia to industry. He holds a Master of Engineering degree in Big Data Analytics from Arcada University of Applied Sciences. He has a keen interest in R&D in technologies such as Edge AI, Blockchain, NLP, MLOps and Robotics. He believes "the best way to learn is to teach", he is passionate about sharing and learning new technologies with others.
Read more about Emmanuel Raj

Right arrow

Chapter 4: Machine Learning Pipelines

In this chapter, we will explore and implement machine learning (ML) pipelines by going through hands-on examples using the MLOps approach. We will learn more by solving the business problem that we've been working on in Chapter 3, Code Meets Data. This theoretical and practical approach to learning will ensure that you will have comprehensive knowledge of architecting and implementing ML pipelines for your problems or your company's problems. A ML pipeline has modular scripts or code that perform all the traditional steps in ML, such as data preprocessing, feature engineering, and feature scaling before training or retraining any model.

We begin this chapter by ingesting the preprocessed data we worked on in the last chapter by performing feature engineering and scaling it to get it in shape for the ML training. We will discover the principles of ML pipelines and implement them on the business problem. Going ahead, we'll look...

Going through the basics of ML pipelines

Before we jump into the implementation of the ML pipeline, let's get the basics right. We will reflect on ML pipelines and set up the needed resources for ML pipeline implementation and then we will get started with data ingestion. Let's demystify ML pipelines by reflecting on the ML pipeline we discussed in Figure 14 of Chapter 1, Fundamentals of MLOps Workflow.

Figure 4.1 – Machine learning pipeline

As shown in Figure 4.1, a comprehensive ML pipeline consists of the following steps:

  1. Data ingestion
  2. Model training
  3. Model testing
  4. Model packaging
  5. Model registering

We will implement all these steps of the pipeline using the Azure ML service (cloud-based) and MLflow (open source) simultaneously for the sake of a diverse perspective. Azure ML and MLflow are a power couple for MLOps: they exhibit the features shown in Table 4.1. They are also unique in their capabilities...

Data ingestion and feature engineering

Data is essential to train ML models; without data, there is no ML. Data ingestion is a trigger step for the ML pipeline. It deals with the volume, velocity, veracity, and variety of data by extracting data from various data sources and ingesting the needed data for model training.

The ML pipeline is initiated by ingesting the right data for training the ML models. We will start by accessing the preprocessed data we registered in the previous chapter. Follow these steps to access and import the preprocessed data and get it ready for ML training:

  1. Using the Workspace() function from the Azure ML SDK, access the data from the datastore in the ML workspace as follows:
    from azureml.core import Workspace, Dataset
    subscription_id = 'xxxxxx-xxxxxx-xxxxxxx-xxxxxxx'
    resource_group = 'Learn_MLOps'
    workspace_name = 'MLOps_WS'
    workspace = Workspace(subscription_id, resource_group, workspace_name)

    Note

    Insert your own...

Machine learning training and hyperparameter optimization

We are all set to do the fun part, training ML models! This step enables model training; it has modular scripts or code that perform all the traditional steps in ML training, such as fitting and transforming data to train the model and hyperparameter tuning to converge the best model. The output of this step is a trained ML model.

To solve the business problem, we will train two well-known models using the Support Vector Machine classifier and the Random Forest classifier. These are chosen based on their popularity and consistency of results; you are free to choose models of your choice – there are no limitations in this step. First, we will train the Support Vector Machine classifier and then the Random Forest classifier.

Support Vector Machine

Support Vector Machine (SVM) is a popular supervised learning algorithm (used for classification and regression). The data points are classified using hyperplanes in...

Model testing and defining metrics

In this step, we evaluate the trained model performance on a separate set of data points, named test data (which was split and versioned earlier, in the data ingestion step). The inference of the trained model is evaluated according to the selected metrics as per the use case. The output of this step is a report on the trained model performance.

To gain a comprehensive analysis of the model performance, we will measure the accuracy, precision, recall, and f-score. This is what they mean in practice in the context of the business problem:

  • Accuracy: Number of correct predictions by the total number of predictions of data test samples.
  • Precision: Precision measures the proportion of positives that were correctly predicted as positive. Precision = True Positives / (True Positives + False Positives)
  • Recall: Recall measures the proportion of actual positives that were identified correctly. Recall = True Positives / (True Positives...

Model packaging

After the trained model has been tested in the previous step, the model can be serialized into a file to be exported to the test or the production environment. Serialized files come with compatibility challenges, such as model interoperability, if not done right. Model interoperability is a challenge, especially when models are trained using different frameworks. For example, if model 1 is trained using sklearn and model 2 is trained using TensorFlow, then model 1 cannot be imported or exported using TensorFlow for further model fine-tuning or model inference.

To avoid this problem, ONNX offers an open standard for model interoperability. ONNX stands for Open Neural Network Exchange. It provides a serialization standard for importing and exporting models. We will use the ONNX format to serialize the models to avoid compatibility and interoperability issues.

Using ONNX, the trained model is serialized using the skl2onnx library. The model is serialized as the...

Registering models and production artifacts

In this step, the model that has been serialized or containerized in the previous step is registered and stored in the model registry. A registered model is compiled as a logical container for one or more files that function as a model. For instance, a model made up of multiple files can be registered as a single model in the model registry. By downloading the registered model, all the files can be received. The registered model can be deployed and used for inference on demand.

Let's register our serialized models in the previous section by using the model .register() function from the Azure ML SDK. By using this function, the serialized ONNX file is registered to the workspace for further use and deploying to the test and production environment. Let's register the serialized SVM classifier model (svc.onnx):

# Register Model on AzureML WS
model = Model.register (model_path = './outputs/svc.onnx', # this points to...

Summary

In this chapter, we went through the theory of ML pipelines and practiced them by building ML pipelines for a business problem. We set up tools, resources, and the development environment for training these ML models. We started with the data ingestion step, followed by the model training step, testing step, and packaging step, and finally, we completed the registering step. Congrats! So far, you have implemented a critical building block of the MLOps workflow.

In the next chapter, we will look into evaluating and packaging production models.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering MLOps
Published in: Apr 2021Publisher: PacktISBN-13: 9781800562882
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Emmanuel Raj

Emmanuel Raj is a Finland-based Senior Machine Learning Engineer with 6+ years of industry experience. He is also a Machine Learning Engineer at TietoEvry and a Member of the European AI Alliance at the European Commission. He is passionate about democratizing AI and bringing research and academia to industry. He holds a Master of Engineering degree in Big Data Analytics from Arcada University of Applied Sciences. He has a keen interest in R&D in technologies such as Edge AI, Blockchain, NLP, MLOps and Robotics. He believes "the best way to learn is to teach", he is passionate about sharing and learning new technologies with others.
Read more about Emmanuel Raj