Reader small image

You're reading from  MLOps with Red Hat OpenShift

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781805120230
Edition1st Edition
Right arrow
Authors (2):
Ross Brigoli
Ross Brigoli
author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

Faisal Masood
Faisal Masood
author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood

View More author details
Right arrow

Managing a Model Training Workflow

You created a data science project and a workbench created in OpenShift Data Science (ODS) in the previous chapter. In this chapter, you will learn how to build a model training pipeline. You will see how you can version your data using the partner software available in Red Hat OpenShift and build automated pipelines to retrain your model as new data becomes available. You will use the Jupyter notebook that you have configured in your workbench and write Python code to build a simple machine learning (ML) model.

It is important to understand how to manually embed a model into an application before we introduce you to the concept of model serving. We will take you through the following sections in this chapter:

  • Configuring Pachyderm
  • Versioning your data with Pachyderm
  • Training a model using Red Hat ODS
  • Building a model training pipeline

Technical requirements

In this chapter, you need to use the accompanying GitHub repository of this book. This can be found at https://github.com/PacktPublishing/MLOps-with-Red-Hat-OpenShift. The files you will need are in the directory named chapter4. You will also write basic Python code to build a basic model and a model training pipeline.

Configuring Pachyderm

Let’s start by configuring Pachyderm. Pachyderm is a platform that assists data scientists in creating complete ML workflows covering all the stages from data ingestion and model training up to deploying into production. Think of it as a version control system (VCS) for your model development workflow.

In traditional software engineering, you may use Git to version control your code. In ML projects, you need to version control your data, and you want a reproducible flow for training your model. Pachyderm provides such capabilities for you. You will see how Red Hat OpenShift enables you to use Pachyderm. Refer to Chapter 3 for instructions on installing the Pachyderm operator.

Follow these steps to configure Pachyderm. Pachyderm needs a relational database management system (RDBMS) to store metadata, and the operator takes care of the Pachyderm and related database components. Pachyderm requires Simple Storage Service (S3) storage to store the Pachyderm...

Versioning your data with Pachyderm

Data is the fundamental component for building your models. Without a retrievable version of the dataset the model was trained on, you cannot replicate the model training activity you did in the past and expect the same results. Data versioning enables dataset comparisons and prevents confusion that may occur due to data changes. This allows us to build a reproducible model training workflow. To learn more about Pachyderm in depth, refer to the Pachyderm documentation at https://docs.pachyderm.com/.

To work with Pachyderm, you can either use the Pachyderm command-line tool, pachctl, or the Pachyderm Python library, which we will use in this book.

Before we start, let’s create a new bucket in your MinIO server. We will use this to store the datasets. Let’s call this bucket raw-data. Then, upload the wine.csv file available in the Git repository of this book into this bucket. For the purpose of this exercise, set the raw-data bucket...

Training a model using Red Hat ODS

Let’s build a simple model using Red Hat ODS. Recall Chapter 3, Building Machine Learning models with OpenShift, and create a new data science project named wines. Create a workbench named wines inside the project using the Standard Data Science notebook image and a Small container size. Create new persistence storage named wines with 20 GB of storage. There is no need to create a data connection at this stage. Once you create this project, you will have the following screen for your project:

Figure 4.8 – Red Hat data science project

  1. Now, launch the notebook and clone the accompanying Git repository of this book. Use the chapter4/wine-data-version.ipynb file to create a version of the wines.csv file in the Pachyderm repo. Note the commit ID while running this notebook.
  2. Once you have executed this notebook, open chapter4/wine-training.ipynb to train a simple linear regression model. Let’...

Building a model training pipeline

Red Hat OpenShift pipelines automate training and deployment workflows. They are based on the Kubeflow pipeline domain-specific language (DSL) and backed by the Tekton engine. In this section, you will build a pipeline from the notebook you created earlier. In the next chapters, you will add more stages to this pipeline.

Installing Red Hat OpenShift Pipelines

This is a familiar process where you log in to OpenShift, select the right operator, and perform an install. Follow the next steps to install the pipeline operator:

  1. Log in to the OpenShift console and search for OpenShift Pipelines from OperatorHub, as shown in Figure 4.9. Click on the Red Hat OpenShift Pipelines tile:
Figure 4.9 – OpenShift Pipelines operator

Figure 4.9 – OpenShift Pipelines operator

  1. Using all the default options, click on the Install button, as shown in Figure 4.10:

Note

The version of the operator may have already changed by the time you are reading...

Summary

In this chapter, you have seen how Red Hat ODS integrates with third-party software to further simplify your MLOps journey. OpenShift makes it a breeze to use the data versioning capabilities of the Pachyderm software.

You have seen how Red Hat OpenShift Pipelines enables your data science team to automate the model training workflow by providing drag-and-drop capabilities and using the same code you have used to manually train the model.

You will use the core pipeline capabilities in the next few chapters, where you'll further automate your workflow. In the next chapter, you will use what you have learned about creating pipelines to deploy the model as a service.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
MLOps with Red Hat OpenShift
Published in: Jan 2024Publisher: PacktISBN-13: 9781805120230
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood