Reader small image

You're reading from  MLOps with Red Hat OpenShift

Product typeBook
Published inJan 2024
PublisherPackt
ISBN-139781805120230
Edition1st Edition
Right arrow
Authors (2):
Ross Brigoli
Ross Brigoli
author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

Faisal Masood
Faisal Masood
author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood

View More author details
Right arrow

Operating ML Workloads

In the previous chapter, you learned how to automate model deployments through OpenShift Data Science (ODS) pipelines. This chapter will focus on the operational tasks of MLOps. This includes monitoring and logging, using the in-built tools of Red Hat OpenShift Data Science. We will not cover the common OpenShift operation and administration tasks in this chapter as that is beyond the scope of this book. However, we will talk about some of the OpenShift concepts you need to know to understand the topics in this chapter.

The exercises in this chapter require a basic understanding of OpenShift and/or Kubernetes as well as basic knowledge of Prometheus time-series databases and Grafana visualization dashboards. The following topics will be covered in this chapter:

  • Monitoring ML models
  • Logging model inference
  • Cost optimization

The materials required for this chapter can be found in the GitHub repository of this book. The files that you will...

Monitoring ML models

Observability is a concept primarily used in the context of systems engineering, computer science, and monitoring complex systems. It refers to the ability to understand and infer the internal state and behavior of a system by examining its external outputs or observables. In simpler terms, it’s about gaining insight into how a system operates and performs by observing its outputs or responses.

Monitoring is one of the subjects of observability. It focuses on tracking and measuring predefined metrics and thresholds to ensure that systems and services are running within the expected parameters. It is also referred to as telemetry, akin to how real-time metrics data is collected in mission-critical operations such as launching a rocket to the moon. Unlike logging, which focuses on collecting event data for auditing and troubleshooting at a later date, monitoring focuses on real-time events and is focused on metrics information. For example, logging data...

Installing and configuring Prometheus and Grafana

To get started with implementing monitoring, we need Prometheus and Grafana. The OpenShift Data Science operator comes with an in-built Prometheus cluster and the model-serving component of ODS is already exposing metrics information by default. This comes pre-installed and pre-configured in your OpenShift cluster when you install the ODS operator. For Grafana, we will install it from OperatorHub.

The following steps will guide you through the process of installing and configuring Prometheus and Grafana for your Red Hat OpenShift Data Science cluster:

  1. Verify that the Prometheus cluster is installed and is running on your cluster. In your OpenShift web console, navigate to Workloads | Pods.
  2. Select the redhat-ods-monitoring project. You should see that the Prometheus Pods have a Running status. If you do not see this, you may need to re-install the Red Hat OpenShift Data Science operator.
  3. Navigate to Networking | Routes...

Logging inference calls

Logging is an essential part of any software architecture. We use logs to recall and investigate what happened to the system in the past. Unlike monitoring, logs are more focused on the events that occurred in the system in the past with the objective of providing the capability to look back on or perform an audit of past events.

Logging in MLOps is no different. However, there are a few aspects of logging that are more common in ML model inference than in traditional software. Here are some of the properties that we need to look out for in ML model inference logging:

  • Unstructured data: In some cases, the data you input into the inference call may not always be simple JSON-formatted text; it could be an image, video, or audio as well. This kind of unstructured data may require a different kind of storage system for logs.
  • Non-deterministic behavior: Some models, depending on the algorithm used, may not always return the same output for the same...

Optimizing cost

When it comes to managing an OpenShift cluster, it’s not just about making sure your applications run smoothly; it’s also about keeping a close eye on your cloud infrastructure costs. OpenShift is incredibly powerful, but if you’re not careful, it can lead to unnecessary overspending. In this guide, we’ll dive into some practical strategies to help you fine-tune your OpenShift cluster, so you can strike that perfect balance between having the resources you need and keeping your budget under control. From optimizing how you allocate resources to scaling your cluster intelligently, these practices will empower you to make the most of your Kubernetes setup without breaking the bank:

  • Rightsize resources: Take a closer look at the resource requirements of your applications running in Pods. Adjust the allocated CPU and memory to match the actual needs of each application. Avoid overallocating resources, which can lead to unnecessary costs...

Summary

This chapter focused on the operational tasks related to running and serving ML models on OpenShift and OpenShift Data Science. You have learned that Red Hat OpenShift Data Science comes with a Prometheus instance. You have also learned how to set up Grafana to visualize the Prometheus data.

We have talked about the importance of logging and how it is different from monitoring and traditional software application logging. You have also learned how to enable the ModelMesh payload processors to achieve payload logging.

We have also learned that the current version of ODS does not yet contain a feature for configuring the logging dimension of model servers through the web console.

As part of your learning, we encourage you to experiment with the configurations beyond what was described in the book. There is a lot more to learn about Grafana and Prometheus. You can explore other metrics in Prometheus and create custom dashboards in Grafana. We also encourage you to experiment...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
MLOps with Red Hat OpenShift
Published in: Jan 2024Publisher: PacktISBN-13: 9781805120230
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Ross Brigoli

Ross Brigoli is a consulting architect at Red Hat, where he focuses on designing and delivering solutions around microservices architecture, DevOps, and MLOps with Red Hat OpenShift for various industries. He has two decades of experience in software development and architecture.
Read more about Ross Brigoli

author image
Faisal Masood

Faisal Masood is a cloud transformation architect at AWS. Faisal's focus is to assist customers in refining and executing strategic business goals. Faisal main interests are evolutionary architectures, software development, ML lifecycle, CD and IaC. Faisal has over two decades of experience in software architecture and development.
Read more about Faisal Masood