Reader small image

You're reading from  MLOps with Red Hat OpenShift

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781805120230
Edition1st Edition
Right arrow
Authors (2):
Ross Brigoli
Ross Brigoli
author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

Faisal Masood
Faisal Masood
author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood

View More author details
Right arrow

Building Machine Learning Models with OpenShift

In the previous chapter, you installed and configured OpenShift to power your machine learning (ML) project life cycle. In this chapter, you will configure the platform components required for model development. This chapter will equip you with what is available on the OpenShift platform for building ML models and how your team can leverage it. Please ensure that you have completed the setup mentioned in the previous chapter before starting this chapter.

This is the first stage of the ML development life cycle, which we presented in Chapter 2. In this chapter, you will see how easy it is for you and your team to start building with the technology provided by Red Hat OpenShift for Data Science (RHODS).

We will cover the following topics:

  • Using Jupyter Notebooks in OpenShift
  • Using ML frameworks in OpenShift
  • Using GPU acceleration for model training
  • Building custom notebooks

Technical requirements

In this chapter, you’ll need to use this book’s GitHub repository. This can be found at https://github.com/PacktPublishing/MLOps-with-Red-Hat-OpenShift. The files that you will need in this chapter are located in the chapter3 directory. You will also write basic Python code to validate the deployments and configurations.

Using Jupyter Notebooks in OpenShift

Jupyter Notebooks is the de facto standard environment for data scientists and data engineers to analyze data and build ML models. Since the notebooks provided by the platform run as containers, you will see how your team can start quickly and consistently by adopting the platform. The platform provides a rapid way to develop, train, and test ML models and deploy them onto production. In the ODS platform, the Jupyter Notebooks environments are referred to as workbenches. You will learn about workbenches later in this section. But first, we need to learn how to create these environments.

We’ll start by provisioning S3 object storage for you to access the data required for the model training process. This is part of the platform setup, and data scientists will not have to execute these steps for their day-to-day work.

Provisioning an S3 store

ML loves data. A lot of data! S3-compatible object storage software is becoming the de facto standard for storing and retrieving unstructured data at scale and is available on all three big cloud vendors. You can leverage Kubernetes-native open source tools such as MinIO to provision an S3-compatible object store on your OpenShift cluster. MinIO is a high-performance, S3-compatible object store that can be deployed on OpenShift, through which you can use it on-premises and in the cloud.

Red Hat also provides an integrated storage component on the OpenShift platform, named Open Data Foundation, that provides an S3-compatible API. Any standard S3-compatible object storage product will work with ODS. For this book, we’ve chosen MinIO for simplicity. So, let’s start by installing MinIO on the OpenShift platform.

From the code repository for this book, go to the chapter3 folder and find the minio-complete.yaml file. The following steps show how...

Using ML frameworks in OpenShift

So far, you have seen how easy it is to spin up environments based on your chosen configuration. Red Hat provides a list of pre-built images with popular frameworks to speed up your development workflow. We all know how troublesome it is to maintain multiple runtimes and frameworks with multiple library dependencies. Say you want to start a new environment with TensorFlow. You just select the right container image, as shown in the following screenshot. The View package information option provides you with details on what version and library set is available in the container image. The list of available container images is always growing; later, you will learn how to provide custom container images if required:

Figure 3.18 – RHODS – workbench with TensorFlow image

Figure 3.18 – RHODS – workbench with TensorFlow image

You may have multiple workbenches with different hardware and software. All these environments are listed under your data science project. You can...

Using GPU acceleration for model training

In the previous section, you customized software components that your team needs to build models. In this section, you will see how RHODS makes it easy for you to use specific hardware for your workbench.

Imagine that you are working on a simple supervised learning model, and you do not need any specific hardware, such as a GPU, to complete your work. If you work on laptops, then the hardware is fixed and shipped with your laptop. You cannot change it dynamically and it would be expensive for organizations to give every data scientist specialized GPU hardware. It’s worse if there is a new model of the GPU and you already bought an older version for your team. RHODS enables you to provision hardware on-demand for your team, so if one member needs a GPU, they can just select it from the UI and start using it. Then, when their work is done, the GPU is released back to the hardware pool. This dynamic nature not only reduces costs but...

Enabling GPU support

First, you need to start provisioning nodes with a GPU. Like MinIO, this is a one-time activity that will be executed by the platform engineering team. The entire process of enabling the GPU can be automated for your OpenShift clusters. Let’s learn how to provision the machines with a GPU in our cluster.

OpenShift enables you to use machine sets to provision nodes – nodes where your container images would run. To enable GPU support for the Jupyter environment that you created earlier, you need to provision nodes with a GPU. Once the nodes with the GPU have been provisioned, RHODS will automatically detect them and allow you to use the Jupyter environment with GPU support.

For the ROSA cluster, you can use the Red Hat cloud console to provision new machines. OpenShift can scale out machines so that it provisions the hardware as needed. You can also choose to use the machine on spot instances to further reduce your bill. Let’s see how to...

Building custom notebooks

Though RHODS comes with a few in-built notebooks that you can use, you may require a different library version and/or dependency, or you may want to add your organization certificates to the notebook. The point is there can be many reasons why the provided notebooks may require some tuning.

In this section, you will learn how to tune the existing notebook images, import third-party notebook images, and create your custom notebook images.

RHODS allows you to bring notebook images into the platform, either by importing an existing container image from a registry such as DockerHub, Quay, or any other container registry, or by customizing an existing notebook image. Let’s look at how to create custom notebook images and import them into RHODS.

Creating a custom notebook image

Creating custom notebook images follows a standard container image build process. This involves creating a Dockerfile that describes how the container image is to be built...

Summary

In this chapter, you learned how to use the core features of RHODS. You learned how to create and manage data science projects, workbenches, storage, and data connections.

You also saw how RHODS does the heavy lifting for hardware and software provisioning for your model development workflow. This includes learning how to take advantage of GPUs through machine pools. This dynamic model development environment enables your team to be more agile and focus on model building instead of managing the libraries.

Finally, you learned how to extend the base images to create a set of environments that is more suited to your needs. There, you learned how to create and use custom notebook images in RHODS. This allows you to further customize and tailor the experiences of your data science team.

In the next chapter, you will learn how to build and package ML models for consumption.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
MLOps with Red Hat OpenShift
Published in: Jan 2024Publisher: PacktISBN-13: 9781805120230
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood