You're reading from MLOps with Red Hat OpenShift

Product typeBook

Published inJan 2024

PublisherPackt

ISBN-139781805120230

Edition1st Edition

Concepts

Machine Learning

Authors (2):

Ross Brigoli

Faisal Masood

View More author details

Managing a Model Training Workflow

You created a data science project and a workbench created in OpenShift Data Science (ODS) in the previous chapter. In this chapter, you will learn how to build a model training pipeline. You will see how you can version your data using the partner software available in Red Hat OpenShift and build automated pipelines to retrain your model as new data becomes available. You will use the Jupyter notebook that you have configured in your workbench and write Python code to build a simple machine learning (ML) model.

It is important to understand how to manually embed a model into an application before we introduce you to the concept of model serving. We will take you through the following sections in this chapter:

Configuring Pachyderm
Versioning your data with Pachyderm
Training a model using Red Hat ODS
Building a model training pipeline

Technical requirements

In this chapter, you need to use the accompanying GitHub repository of this book. This can be found at https://github.com/PacktPublishing/MLOps-with-Red-Hat-OpenShift. The files you will need are in the directory named chapter4. You will also write basic Python code to build a basic model and a model training pipeline.

Configuring Pachyderm

Let’s start by configuring Pachyderm. Pachyderm is a platform that assists data scientists in creating complete ML workflows covering all the stages from data ingestion and model training up to deploying into production. Think of it as a version control system (VCS) for your model development workflow.

In traditional software engineering, you may use Git to version control your code. In ML projects, you need to version control your data, and you want a reproducible flow for training your model. Pachyderm provides such capabilities for you. You will see how Red Hat OpenShift enables you to use Pachyderm. Refer to Chapter 3 for instructions on installing the Pachyderm operator.

Follow these steps to configure Pachyderm. Pachyderm needs a relational database management system (RDBMS) to store metadata, and the operator takes care of the Pachyderm and related database components. Pachyderm requires Simple Storage Service (S3) storage to store the Pachyderm...

Versioning your data with Pachyderm

Data is the fundamental component for building your models. Without a retrievable version of the dataset the model was trained on, you cannot replicate the model training activity you did in the past and expect the same results. Data versioning enables dataset comparisons and prevents confusion that may occur due to data changes. This allows us to build a reproducible model training workflow. To learn more about Pachyderm in depth, refer to the Pachyderm documentation at https://docs.pachyderm.com/.

To work with Pachyderm, you can either use the Pachyderm command-line tool, pachctl, or the Pachyderm Python library, which we will use in this book.

Before we start, let’s create a new bucket in your MinIO server. We will use this to store the datasets. Let’s call this bucket raw-data. Then, upload the wine.csv file available in the Git repository of this book into this bucket. For the purpose of this exercise, set the raw-data bucket...

Training a model using Red Hat ODS

Let’s build a simple model using Red Hat ODS. Recall Chapter 3, Building Machine Learning models with OpenShift, and create a new data science project named wines. Create a workbench named wines inside the project using the Standard Data Science notebook image and a Small container size. Create new persistence storage named wines with 20 GB of storage. There is no need to create a data connection at this stage. Once you create this project, you will have the following screen for your project:

Figure 4.8 – Red Hat data science project

Now, launch the notebook and clone the accompanying Git repository of this book. Use the chapter4/wine-data-version.ipynb file to create a version of the wines.csv file in the Pachyderm repo. Note the commit ID while running this notebook.
Once you have executed this notebook, open chapter4/wine-training.ipynb to train a simple linear regression model. Let’...

Building a model training pipeline

Red Hat OpenShift pipelines automate training and deployment workflows. They are based on the Kubeflow pipeline domain-specific language (DSL) and backed by the Tekton engine. In this section, you will build a pipeline from the notebook you created earlier. In the next chapters, you will add more stages to this pipeline.

Installing Red Hat OpenShift Pipelines

This is a familiar process where you log in to OpenShift, select the right operator, and perform an install. Follow the next steps to install the pipeline operator:

Log in to the OpenShift console and search for OpenShift Pipelines from OperatorHub, as shown in Figure 4.9. Click on the Red Hat OpenShift Pipelines tile:

Figure 4.9 – OpenShift Pipelines operator

Using all the default options, click on the Install button, as shown in Figure 4.10:

Note

The version of the operator may have already changed by the time you are reading...

Summary

In this chapter, you have seen how Red Hat ODS integrates with third-party software to further simplify your MLOps journey. OpenShift makes it a breeze to use the data versioning capabilities of the Pachyderm software.

You have seen how Red Hat OpenShift Pipelines enables your data science team to automate the model training workflow by providing drag-and-drop capabilities and using the same code you have used to manually train the model.

You will use the core pipeline capabilities in the next few chapters, where you'll further automate your workflow. In the next chapter, you will use what you have learned about creating pipelines to deploy the model as a service.

The rest of the chapter is locked

You have been reading a chapter from

MLOps with Red Hat OpenShift

Published in: Jan 2024Publisher: PacktISBN-13: 9781805120230

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Authors (2)

Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages