The Kubernetes Operator Framework Book

By Michael Dame
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    Chapter 1: Introducing the Operator Framework
About this book

From incomplete collections of knowledge and varying design approaches to technical knowledge barriers, Kubernetes users face various challenges when developing their own operators. Knowing how to write, deploy, and pack operators makes cluster management automation much easier – and that's what this book is here to teach you.

Beginning with operators and Operator Framework fundamentals, the book delves into how the different components of Operator Framework (such as the Operator SDK, Operator Lifecycle Manager, and are used to build operators. You’ll learn how to write a basic operator, interact with a Kubernetes cluster in code, and distribute that operator to users. As you advance, you’ll be able to develop a sample operator in the Go programming language using Operator SDK tools before running it locally with Operator Lifecycle Manager, and also learn how to package an operator bundle for distribution. The book covers best practices as well as sample applications and case studies based on real-world operators to help you implement the concepts you’ve learned.

By the end of this Kubernetes book, you’ll be able to build and add application-specific operational logic to a Kubernetes cluster, making it easier to automate complex applications and augment the platform.

Publication date:
July 2022


Chapter 1: Introducing the Operator Framework

Managing a Kubernetes cluster is hard. This is partly due to the fact that any microservice architecture is going to be inherently based on the interactions of many small components, each introducing its own potential point of failure. There are, of course, many benefits to this type of system design, such as graceful error handling thanks to the separation of responsibilities. However, diagnosing and reconciling such errors requires significant engineering resources and a keen familiarity with an application's design. This is a major pain point for project teams who migrate to the Kubernetes platform.

The Operator Framework was introduced to the Kubernetes ecosystem to address these problems. This chapter will go over a few general topics to give a broad overview of the Operator Framework. The intent is to provide a brief introduction to the Operator Framework, the problems it solves, how it solves them, and the tools and patterns it provides to users. This will highlight key takeaways for the goals and benefits of using Operators to help administrate a Kubernetes cluster. These topics include the following:

  • Managing clusters without Operators
  • Introducing the Operator Framework
  • Developing with the Operator software development kit (SDK)
  • Managing Operators with the Operator Lifecycle Manager (OLM)
  • Distributing Operators on
  • Defining Operator functions with the Capability Model
  • Using Operators to manage applications

Technical requirements

This chapter does not have any technical requirements because we will only be covering general topics. In later chapters, we will discuss these various topics in depth and include technical prerequisites for following along with them.

The Code in Action video for this chapter can be viewed at:


Managing clusters without Operators

Kubernetes is a powerful microservice container orchestration platform. It provides many different controllers, resources, and design patterns to cover almost any use case, and it is constantly growing. Because of this, applications designed to be deployed on Kubernetes can be very complex.

When designing an application to use microservices, there are a number of concepts to be familiar with. In Kubernetes, these are mainly the native application programming interface (API) resource objects included in the core platform. Throughout this book, we will assume a foundational familiarity with the common Kubernetes resources and their functions.

These objects include Pods, Replicas, Deployments, Services, Volumes, and more. The orchestration of any microservice-based cloud application on Kubernetes relies on integrating these different concepts to weave a coherent whole. This orchestration is what creates a complexity that many application developers struggle to manage.

Demonstrating on a sample application

Take, for example, a simple web application that accepts, processes, and stores user input (such as a message board or chat server). A good, containerized design for an application such as this would be to have one Pod presenting the frontend to the user and a second backend Pod that accepts the user's input and sends it to a database for storage.

Of course, you will then need a Pod running the database software and a Persistent Volume to be mounted by the database Pod. These three Pods will benefit from Services to communicate with each other, and they will also need to share some common environment variables (such as access credentials for the database and environment variables to tweak different application settings).

Here is a diagram of what a sample application of this sort could look like. There are three Pods (frontend, backend, and database), as well as a Persistent Volume:

Figure 1.1 – Simple application diagram with three Pods and a Persistent Volume

Figure 1.1 – Simple application diagram with three Pods and a Persistent Volume

This is just a small example, but it's already evident how even a simple application can quickly involve tedious coordination between several moving parts. In theory, these discrete components will all continue to function cohesively as long as each individual component does not fail. But what about when a failure does occur somewhere in the application's distributed design? It is never wise to assume that an application's valid state will consistently remain that way.

Reacting to changing cluster states

There are a number of reasons a cluster state can change. Some may not even technically be considered a failure, but they are still changes of which the running application must be aware. For example, if your database access credentials change, then that update needs to be propagated to all the Pods that interact with it. Or, a new feature is available in your application that requires tactful rollout and updated settings for the running workloads. This requires manual effort (and, more importantly, time), along with a keen understanding of the application architecture.

Time and effort are even more critical in the case of an unexpected failure. These are the kinds of problems that the Operator Framework addresses automatically. If one of the Pods that make up this application hits an exception or the application's performance begins to degrade, these scenarios require intervention. That means a human engineer must not only know the details of the deployment, but they must also be on-call to maintain uptime at any hour.

There are additional components that can help administrators monitor the health and performance of their applications, such as metrics aggregation servers. However, these components are essentially additional applications that must also be regularly monitored to make sure they are working, so adding them to a cluster can reintroduce the same issues of managing an application manually.


Introducing the Operator Framework

The concept of Kubernetes Operators was introduced in a blog post in 2016 by CoreOS. CoreOS created their own container-native Linux operating system that was optimized for the needs of cloud architecture. Red Hat acquired the company in 2018, and while the CoreOS operating system's official support ended in 2020, their Operator Framework has thrived.

The principal idea behind an Operator is to automate cluster and application management tasks that would normally be done manually by a human. This role can be thought of as an automated extension of support engineers or development-operations (DevOps) teams.

Most Kubernetes users will already be familiar with some of the design patterns of Operators, even if they have never used the Operator Framework before. This is because Operators are a seemingly complicated topic, but ultimately, they are not functionally much different than many of the core components that already automate most of a Kubernetes cluster by default. These components are called controllers, and at its core, any Operator is essentially just a controller.

Exploring Kubernetes controllers

Kubernetes itself is made up of many default controllers. These controllers maintain the desired state of the cluster, as set by users and administrators. Deployments, ReplicaSets, and Endpoints are just a few examples of cluster resources that are managed by their own controllers. Each of these resources involves an administrator declaring the desired cluster state, and it is then the controller's job to maintain that state. If there is any deviation, the controller must act to resolve what they control.

These controllers work by monitoring the current state of the cluster and comparing it to the desired state. One example is a ReplicaSet with a specification to maintain three replicas of a Pod. Should one of the replicas fail, the ReplicaSet quickly identifies that there are now only two running replicas. It then creates a new Pod to bring stasis back to the cluster.

In addition, these core controllers are collectively managed by the Kube Controller Manager, which is another type of controller. It monitors the state of controllers and attempts to recover from errors if one fails or reports the error for human intervention if it cannot automatically recover. So, it is even possible to have controllers that manage other controllers.

In the same way, Kubernetes Operators put the development of operational controllers in the hands of users. This provides administrators with the flexibility to write a controller that can manage any aspect of a Kubernetes cluster or custom application. With the ability to define more specific logic, developers can extend the main benefits of Kubernetes to the unique needs of their own applications.

The Operators that are written following the guidelines of the Operator Framework are designed to function very similarly to native controllers. They do this by also monitoring the current state of the cluster and acting to reconcile it with the desired state. Specifically, an Operator is tailored to a unique workload or component. The Operator then knows how to interact with that component in various ways.

Knowing key terms for Operators

The component that is managed by an Operator is its Operand. An Operand is any kind of application or workload whose state is reconciled by an Operator. Operators can have many Operands, though most Operators manage—at most—just a few (usually just one). The key distinction is that Operators exist to manage Operands, where the Operator is a meta-application in the architectural design of the system.

Operands can be almost any type of workload. While some Operators manage application deployments, many others deploy additional, optional cluster components offering meta-functionality such as database backup and restoration. Some Operators even make core native Kubernetes components their Operands, such as etcd. So, an Operator doesn't even need to be managing your own workloads; they can help with any part of a cluster.

No matter what the Operator is managing, it must provide a way for cluster administrators to interact with it and configure settings for their application. An Operator exposes its configuration options through a Custom Resource.

Custom Resources are created as API objects following the constraints of a matching CustomResourceDefinition (CRD). CRDs are themselves a type of native Kubernetes object that allows users and administrators to extend the Kubernetes platform with their own resource objects beyond what is defined in the core API. In other words, while a Pod is a built-in native API object in Kubernetes, CRDs allow cluster administrators to define MyOperator as another API object and interact with it the same way as native objects.

Putting it all together

The Operator Framework strives to define an entire ecosystem for Operator development and distribution. This ecosystem comprises three pillars that cover the coding, deployment, and publishing of Operators. They are the Operator SDK, OLM, and OperatorHub.

These three pillars are what have made the Operator Framework so successful. They transform the framework from just development patterns to an encompassing, iterative process that spans the entire lifecycle of an Operator. This helps support the contract between Operator developers and users to provide consistent industry standards for their software.

The lifecycle of an Operator begins with development. To help with this, the Operator SDK exists to guide developers in the first steps of creating an Operator. Technically, an Operator does not have to be written with the Operator SDK, but the Operator SDK provides development patterns to significantly reduce the effort needed to bootstrap and maintain an Operator's source code.

While coding and development are certainly important parts of creating an Operator, any project's timeline does not end once the code is compiled. The Operator Framework community recognized that a coherent ecosystem of projects must offer guidance beyond just the initial development stage. Projects need consistent methods for installation, and as software evolves, there is a need to publish and distribute new versions. OLM and OperatorHub help users to install and manage Operators in their cluster, as well as share their Operators in the community.

Finally, the Operator Framework provides a scale of Operator functionality called the Capability Model. The Capability Model provides developers with a way to classify the functional abilities of their Operator by answering quantifiable questions. An Operator's classification, along with the Capability Model, gives users information about what they can expect from the Operator.

Together, these three pillars establish the basis of the Operator Framework and form the design patterns and community standards that distinguish Operators as a concept. Along with the Capability Model, this standardized framework has led to an explosion in the adoption of Operators in Kubernetes.

At this point, we have discussed a brief introduction to the core concepts of the Operator Framework. In contrast with a Kubernetes application managed without an Operator, the pillars of the Operator Framework address problems met by application developers. This understanding of the core pillars of the Operator Framework will set us up for exploring each of them in more depth.


Developing with the Operator SDK

The first pillar of the Operator Framework is the Operator SDK. As with any other software development toolkit, the Operator SDK provides packaged functionality and design patterns as code. These include predefined APIs, abstracted common functions, code generators, and project scaffolding tools to easily start an Operator project from scratch.

The Operator SDK is primarily written in Go, but its tooling allows Operators to be written using Go code, Ansible, or Helm. This gives developers the ability to write their Operators from the ground up by coding the CRDs and reconciliation logic themselves, or by taking advantage of automated deployment tools provided by Ansible and Helm to generate their APIs and reconciliation logic depending on their needs.

Developers interact with the Operator SDK through its operator-sdk command-line binary. The binary is available on Homebrew for Mac and is also available directly from the Operator Framework GitHub repository ( as a release, where it can also be compiled from source.

Whether you are planning to develop an Operator with Go, Ansible, or Helm, the Operator SDK binary provides commands to initialize the boilerplate project source tree. These commands include operator-sdk init and operator-sdk create api. The first command initializes a project's source directory with boilerplate Go code, dependencies, hack scripts, and even a Dockerfile and Makefile for compiling the project.

Creating an API for your Operator is necessary to define the CRD required to interact with the Operator once it is deployed in a Kubernetes cluster. This is because CRDs are backed by API type definitions written in Go code. The CRD is generated from these code definitions, and the Operator has logic built in to translate between CRD and Go representations of the object. Essentially, CRDs are how users interact with Operators, and Go code is how the Operator understands the settings. CRDs also add benefits such as structural validation schemas to automatically validate inputs.

The Operator SDK binary has flags to specify the name and version of the API. It then generates the API types as Go code and corresponding YAML Ain't Markup Language (YAML) files based on best-practice standard definitions. However, you are free to modify the definitions of your API in whichever way you choose.

If we were to initialize a basic Operator for an application such as the one first demonstrated at the start of this chapter, the steps would be relatively simple. They would look like this:

$ mkdir sample-app
$ cd sample-app/
$ operator-sdk init --domain --repo
$ operator-sdk create api --group myapp --version v1alpha1 --kind WebApp --resource –controller
$ ls
total 112K
drwxr-xr-x   15 mdame staff  480 Nov 15 17:00 .
drwxr-xr-x+ 270 mdame staff 8.5K Nov 15 16:48 ..
drwx------    3 mdame staff   96 Nov 15 17:00 api
drwxr-xr-x    3 mdame staff   96 Nov 15 17:00 bin
drwx------   10 mdame staff  320 Nov 15 17:00 config
drwx------    4 mdame staff  128 Nov 15 17:00 controllers
drwx------    3 mdame staff   96 Nov 15 16:50 hack
-rw-------    1 mdame staff  129 Nov 15 16:50 .dockerignore
-rw-------    1 mdame staff  367 Nov 15 16:50 .gitignore
-rw-------    1 mdame staff  776 Nov 15 16:50 Dockerfile
-rw-------    1 mdame staff 8.7K Nov 15 16:51 Makefile
-rw-------    1 mdame staff  422 Nov 15 17:00 PROJECT
-rw-------    1 mdame staff  218 Nov 15 17:00 go.mod
-rw-r--r--    1 mdame staff  76K Nov 15 16:51 go.sum
-rw-------    1 mdame staff 3.1K Nov 15 17:00 main.go

After this, you would go on to develop the logic of the Operator based on the method you choose. If that's to write Go code directly, it would start by modifying the *.go files in the project tree. For Ansible and Helm deployments, you would begin working on the Ansible roles or Helm chart for your project.

Finally, the Operator SDK binary provides a set of commands to interact with OLM. These include the ability to install OLM in a running cluster, but also install and manage specific Operators within OLM.


Managing Operators with OLM

OLM is the second pillar of the Operator Framework. Its purpose is to facilitate the deployment and management of Operators in a Kubernetes cluster. It is a component that runs within a Kubernetes cluster and provides several commands and features for interacting with Operators.

OLM is primarily used for the installation and upgrade of Operators—this includes fetching and installing any dependencies for those Operators. Users interact with OLM via commands provided by the Operator SDK binary, the Kubernetes command-line tool (kubectl), and declarative YAML.

To get started, OLM can be initialized in a cluster with the following command:

$ operator-sdk olm install

Besides installing Operators, OLM can also make Operators that are currently installed discoverable to users on the cluster. This provides a catalog of already installed Operators available to cluster users. Also, by managing all the known Operators in the cluster, OLM can watch for conflicting Operator APIs and settings that would destabilize the cluster.

Once an Operator's Go code is compiled into an image, it is ready to be installed into a cluster with OLM running. Technically, OLM is not required to run an Operator in any cluster. For example, it is completely possible to deploy an Operator manually in the cluster, just as with any other container-based application. However, due to the advantages and security measures described previously (including its ability to install Operators and its awareness of other installed Operators), it is highly recommended to use OLM to manage cluster Operators.

When developing an Operator, the image is compiled into a bundle, and that bundle is installed via OLM. The bundle consists of several YAML files that describe the Operator, its CRD, and its dependencies. OLM knows how to process this bundle in its standardized format to properly manage the Operator in a cluster.

Compiling an Operator's code and deploying it can be done with commands such as the ones shown next. The first command shown in the following code snippet builds the bundle of YAML manifests that describe the Operator. Then, it passes that information to OLM to run the Operator in your cluster:

$ make bundle ...
$ operator-sdk run bundle ...

Later chapters will demonstrate exactly how to use these commands and what they do, but the general idea is that these commands first compile the Operator's Go code into an image and a deployable format that's understandable by OLM. But OLM isn't the only part of the Operator Framework that consumes an Operator's bundle—much of the same information is used by OperatorHub to provide information on an Operator.

Once an Operator has been compiled into its image, OperatorHub exists as a platform to share and distribute those images to other users.


Distributing Operators on

The final core component of the Operator Framework is As a major open source project, the Operator Framework ecosystem is built on the open sharing and distribution of projects. Therefore, OperatorHub powers the growth of Operators as a Kubernetes concept.

OperatorHub is an open catalog of Operators published and managed by the Kubernetes community. It serves as a central index of freely available Operators, each contributed by developers and organizations. You can see an overview of the home page in the following screenshot:

Figure 1.2 – Screenshot of the home page, showing some of the most popular Operators

Figure 1.2 – Screenshot of the home page, showing some of the most popular Operators

The process for submitting an Operator to OperatorHub for indexing has been standardized to ensure the consistency and compatibility of Operators with OLM. New Operators are reviewed by automated tooling for compliance with this standard definition of an Operator. The process is mainly handled through the open source GitHub repository that provides the backend of OperatorHub. However, OperatorHub does not provide any assistance with the ongoing maintenance of an Operator, which is why it is important for Operator developers to share links to their own open source repositories and contact information where users can report bugs and contribute themselves.

Preparing an Operator for submission to OperatorHub involves generating its bundle and associated manifests. The submission process primarily relies on the Operator's Cluster Service Version (CSV). The CSV is a YAML file that provides most of the metadata to OLM and OperatorHub about your Operator. It includes general information such as the Operator's name, version, and keywords. However, it also defines installation requirements (such as role-based access control (RBAC) permissions), CRDs, APIs, and additional cluster resource objects owned by the Operator.

The specific sections of an Operator's CSV include the following:

  • The Operator's name and version number, as well as a description of the Operator and its display icon in Base64-encoded image format
  • Annotations for the Operator
  • Contact information for the maintainers of the Operator and the open source repository where its code is located
  • How the Operator should be installed in the cluster
  • Example configurations for the Operator's CRD
  • Required CRDs and other resources and dependencies that the Operator needs to run

Because of all the information that it covers, the Operator CSV is usually very long and takes time to prepare properly. However, a well-defined CSV helps an Operator reach a much wider audience. Details of Operator CSVs will be covered in a later chapter.


Defining Operator functions with the Capability Model

The Operator Framework defines a Capability Model ( that categorizes Operators based on their functionality and design. This model helps to break down Operators based on their maturity, and also describes the extent of an Operator's interoperability with OLM and the capabilities users can expect when using the Operator.

The Capability Model is divided into five hierarchical levels. Operators can be published at any one of these levels and, as they grow, may evolve and graduate from one level to the next as features and functionality are added. In addition, the levels are cumulative, with each level generally encompassing all features of the levels below it.

The current level of an Operator is part of the CSV, and this level is displayed on its OperatorHub listing. The level is based on somewhat subjective yet guided criteria and is purely an informational metric.

Each level has specific functionalities that define it. These functionalities are broken down into Basic Install, Seamless Upgrades, Full Lifecycle, Deep Insights, and Auto Pilot. The specific levels of the Capability Model are outlined here:

  1. Level I—Basic Install: This level represents the most basic of Operator capabilities. At Level I, an Operator is only capable of installing its Operand in the cluster and conveying the status of the workload to cluster administrators. This means that it can set up the basic resources required for an application and report when those resources are ready to be used by the cluster.

At Level I, an Operator also allows for simple configuration of the Operand. This configuration is specified through the Operator's Custom Resource. The Operator is responsible for reconciling the configuration specifications with the running Operand workload. However, it may not be able to react if the Operand reaches a failed state, whether due to malformed configuration or outside influence.

Going back to our example web application from the start of the chapter, a Level I Operator for this application would handle the basic setup of the workloads and nothing else. This is good for a simple application that needs to be quickly set up on many different clusters, or one that should be easily shared with users for them to install themselves.

  1. Level II—Seamless Upgrades: Operators at Level II offer the features of basic installation, with added functionality around upgrades. This includes upgrades for the Operand but also upgrades for the Operator itself.

Upgrades are a critical part of any application. As bug fixes are implemented and more features are added, being able to smoothly transition between versions helps ensure application uptime. An Operator that handles its own upgrades can either upgrade its Operand when it upgrades itself or manually upgrade its Operand by modifying the Operator's Custom Resource.

For seamless upgrades, an Operator must also be able to upgrade older versions of its Operand (which may exist because they were managed by an older version of the Operator). This kind of backward compatibility is essential for both upgrading to newer versions and handling rollbacks (for example, if a new version introduces a high-visibility bug that can't wait for an eventual fix to be published in a patch version).

Our example web application Operator could offer the same set of features. This means that if a new version of the application were released, the Operator could handle upgrading the deployed instances of the application to the newer version. Or, if changes were made to the Operator itself, then it could manage its own upgrades (and later upgrade the application, regardless of version skew between Operator and Operand).

  1. Level III—Full Lifecycle: Level III Operators offer at least one out of a list of Operand lifecycle management features. Being able to offer management during the Operand's lifecycle implies that the Operator is more than just passively operating on a workload in a set and forget fashion. At Level III, Operators are actively contributing to the ongoing function of the Operand.

The features relevant to the lifecycle management of an Operand include the following:

  • The ability to create and/or restore backups of the Operand.
  • Support for more complex configuration options and multistep workflows.
  • Failover and failback mechanisms for disaster recovery (DR). When the Operator encounters an error (either in itself or the Operand), it needs to be able to either re-route to a backup process (fail over) or roll the system back to its last known functioning state (fail back).
  • The ability to manage clustered Operands, and—specifically—support for adding and removing members to and from Operands. The Operator should be capable of considering quorum for Operands that run multiple replicas.
  • Similarly, support for scaling an Operand with worker instances that operate with read-only functionality.

Any Operator that implements one or more of these features can be considered to be at least a Level III Operator. The simple web application Operator could take advantage of a few of these, such as DR and scaling. As the user base grows and resources demands increase, an administrator could instruct the Operator to scale the application with additional replica Pods to handle the increased load.

Should any of the Pods fail during this process, the Operator would be smart enough to know to fail over to a different Pod or cluster zone entirely. Alternatively, if a new version of the web app was released that introduced an unexpected bug, the Operator could be aware of the previous successful version and provide ways to downgrade its Operand workloads if an administrator noticed the error.

  1. Level IV—Deep Insights: While the previous levels focus primarily on Operator features as they relate to functional interaction with the application workload, Level IV emphasizes monitoring and metrics. This means an Operator is capable of providing measurable insights to the status of both itself and its Operand.

Insights may be seen as less important from a development perspective relative to features and bug fixes, but they are just as critical to an application's success. Quantifiable reports about an application's performance can drive ongoing development and highlight areas that need improvement. Having a measurable system to push these efforts allows a way to scientifically prove or disprove which changes have an effect.

Operators most commonly provide their insights in the form of metrics. These metrics are usually compatible with metrics aggregation servers such as Prometheus. (Interestingly enough, Red Hat publishes an Operator for Prometheus that is a Level IV Operator. That Operator is available on OperatorHub at

However, Operators can provide insights through other means as well. These include alerts and Kubernetes Events. Events are built-in cluster resource objects that are used by core Kubernetes objects and controllers.

Another key insight that Level IV Operators report is the performance of the Operator and Operand. Together, these insights help inform administrators about the health of their clusters.

Our simple web application Operator could provide insights about the performance of the Operand. Requests to the app would provide information about the current and historic load on the cluster. Additionally, since the Operator can identify failed states at this point, it could trigger an alert when the application is unhealthy. Many alerts would indicate a reliability issue that would gain the attention of an administrator.

  1. Level V—Auto Pilot: Level V is the most sophisticated level for Operators. It includes Operators that offer the highest capabilities, in addition to the features in all four previous levels. This level is called Auto Pilot because the features that define it focus on being able to run almost entirely autonomously. These capabilities include Auto Scaling, Auto-Healing, Auto-Tuning, and Abnormality Detection.

Auto Scaling is the ability for an Operator to detect the need to scale an application up or down based on demand. By measuring the current load and performance, an Operator can determine whether more or fewer resources are necessary to satisfy the current usage. Advanced Operators can even try to predict the need to scale based on current and past data.

Auto-Healing Operators can react to applications that are reporting unhealthy conditions and work to correct them (or, at least, prevent them from getting any worse). When an Operand is reporting an error, the Operator should take reactive steps to rectify the failure. In addition, Operators can use current metrics to proactively prevent an Operand from transitioning to a failure state.

Auto-Tuning means that an Operator can dynamically modify an Operand for peak performance. This involves tuning the settings of an Operand automatically. It can even include complex operations such as shifting workloads to entirely different nodes that are better suited than their current nodes.

Finally, Abnormality Detection is the capability of an Operator to identify suboptimal or off-pattern behavior in an Operand. By measuring performance, an Operator has a picture of the application's current and historical levels of functioning. That data can be compared to a manually defined minimum expectation or used to dynamically inform the Operator of that expectation.

All of these features are heavily dependent upon the use of metrics to automatically inform the Operator of the need to act upon itself or its Operand. Therefore, a Level V Operator is an inherent progression from Level IV, which is the level at which an Operator exposes advanced metrics.

At Level V, the simple web application Operator would manage most of the aspects of the application for us. It has insights into the current number of requests, so it can scale up copies of the app on demand. If this scaling starts to cause errors (for example, too many concurrent database calls), it can identify the number of failing Pods and prevent further scaling. It would also attempt to modify parameters of the web app (such as request timeouts) to help rectify the situation and allow the auto-scaling to proceed. When the load peak subsided, the Operator would then automatically scale down the application to its baseline service levels.

Levels I and II (Basic Install and Seamless Upgrades) can be used with the three facets of the Operator SDK: Helm, Ansible, and Go. However, Level III and above (Full Lifecycle, Deep Insights, and Auto Pilot) are only possible with Ansible and Go. This is because the functionality at these higher levels requires more intricate logic than what is available through Helm charts alone.

We have now explained the three main pillars of the Operator Framework: Operator SDK, OLM, and OperatorHub. We learned how each contributes different helpful features to the development and usage of Operators. We also learned about the Capability Model, which serves as a reference for the different levels of functionality that Operators can have. In the next section, we'll apply this knowledge to a sample application.


Using Operators to manage applications

Clearly, working with Operators involves more than simply reconciling a cluster state. The Operator Framework is an encompassing platform for Kubernetes developers and users to solve unique problems, which makes Kubernetes so flexible.

Cluster administrators' first step in the Operator Framework is usually either with the Operator SDK, to develop their own Operator if there are no existing Operators that address their needs, or OperatorHub if there are.

Summarizing the Operator Framework

When developing an Operator from scratch, there are three choices for development methods: Go, Ansible, or Helm. However, using Ansible or Helm alone will ultimately limit the Operator's capabilities to the most basic levels of functionality.

If the developer wishes to share their Operator, they will need to package it into the standard manifest bundle for OperatorHub. Following a review, their Operator will be available publicly for other users to download and install in their own clusters.

OLM then makes it easy for users to launch Operators in a cluster. These Operators can be sourced from OperatorHub or written from scratch. Either way, OLM makes Operator installation, upgrades, and management much easier. It also provides several stability benefits when working with many Operators. You can see the relationship between the three services in the following diagram:

Figure 1.3 – The relationship between the Operator SDK, OperatorHub, and OLM

Figure 1.3 – The relationship between the Operator SDK, OperatorHub, and OLM

Each of these pillars provides distinct functions that aid in the development of Operators. Together, they comprise the foundation of the Operator Framework. Utilization of these pillars is the key distinguishing factor between an Operator and a normal Kubernetes controller. To summarize, while every Operator is essentially a controller, not every controller is an Operator.

Applying Operator capabilities

Revisiting the first example in this chapter, the idea of a simple application with three Pods and a Persistent Volume was examined without Operator management. This application relied on optimistic uptime and future-proof design to run continuously. In real-world deployments, these ideas are unfortunately unreasonable. Designs evolve and change, and unforeseeable failures bring applications down. But how could an Operator help this app persist in an unpredictable world?

By defining a single declarative configuration, this Operator could control various settings of the application deployment in one spot. This is the reason Operators are built on CRDs. These custom objects allow developers and users to easily interact with their Operators just as if they were native Kubernetes objects. So, the first step in writing an Operator to manage our simple web application would be to define a basic code structure with a CRD that has all the settings we think we'll need. Once we have done this, the new diagram of our application will look like this:

Figure 1.4 – In the new app layout, the cluster administrator only interacts with the Operator; the Operator then manages the workload

Figure 1.4 – In the new app layout, the cluster administrator only interacts with the Operator; the Operator then manages the workload

This shows how the details of the Operand deployment have been abstracted away from requiring manual administrator control, and the great part about CRDs is that more settings can be added in later versions of the Operator as our app grows. A few examples of settings to start with could be these:

  • Database access information
  • Application behavior settings
  • Log level

While writing our Operator code, we'll also want to write logic for things such as metrics, error handling, and reporting. The Operator can also start to bidirectionally communicate with the Operand. This means that not only can it install and update the Operand, but it can receive communication back from the Operand about its status and report that as well.



In this chapter, we introduced the fundamental concepts of the Operator Framework. These include the Operator SDK, OLM, and OperatorHub. In addition to the development and distribution pillars of the Operator Framework, the Capability Model provides an additional tool for measuring the functionality of an Operator. Throughout this book, we will be exploring these components in deeper detail to get a hands-on understanding of how they actually work.

We began this chapter by examining some of the problems that arise when manually managing applications and clusters without Operators. This was done through the lens of a simple generic web application based on a couple of Pods and a Persistent Volume. The main difficulties in managing something such as this include the time and resources required to debug applications. This is especially important in cloud applications, where high availability (HA) and consistent uptime are top priorities.

We then looked at how each pillar of the Operator Framework addresses the biggest difficulties of application management. These pillars begin with the Operator SDK, which streamlines Operator development. This allows developers to begin iterating on automated reconciliation logic to get their Operators written quickly. It also provides commands to interact with OLM, which is the next pillar of the framework.

OLM exists to help administrators install and curate Operators within a cluster. It provides dependency management and notifies administrators of conflicting APIs to promote cluster stability. It also serves as a local catalog of installed Operators, which is useful for users on the cluster.

Next, we examined OperatorHub and its role in the broader open source Kubernetes community. As an open index of freely available Operators, OperatorHub serves to promote the adoption and maintenance of Operators. It consumes the same manifests as OLM to provide a standardized set of metadata about each Operator to users.

Finally, the Capability Model summarizes the maturity of an Operator based on the functionality it provides. This is helpful to users, but it also serves as a convenient roadmap for developers to plan features for their Operators.

To summarize each of these components, we revisited the original application example presented in the first section. We showed that, with an Operator in place to manage the application, cluster administrators do not need to be keenly aware of the architectural details of the app to keep it running. Instead, this information and controls are abstracted away behind the Operator's interface.

With all of this in mind, we move on to the next chapters to explore each of these topics in depth. We will also be following detailed examples to build our own sample Operator. In the next chapter, we'll begin looking at the important concepts of designing an Operator based on its interactions with a Kubernetes cluster.

About the Author
  • Michael Dame

    Michael Dame is a software engineer who has several years of experience working with Kubernetes and Operators. He is an active contributor to the Kubernetes project, especially on the Scheduler and Descheduler projects. With a degree in Computer Science from Rensselaer Polytechnic Institute, he has worked on the Kubernetes control plane and Operators as a senior engineer at Red Hat. He is currently working with Google on Google Cloud Platform.

    Browse publications by this author
The Kubernetes Operator Framework Book
Unlock this book and the full library FREE for 7 days
Start now