Resource Management and Scaling

Despite the fact that we now have a comprehensive view about everything to do with applications and the cluster thanks to our monitoring system, we are still lacking the ability to handle capacity in terms of computational resources and the cluster. In this chapter, we'll discuss resources, which will include the following topics:

Kubernetes scheduling mechanisms
Affinities between resources and workloads
Scaling smoothly with Kubernetes
Arranging cluster resources
Node administration

Scheduling workloads

The term scheduling refers to assigning resources to a task that needs to be carried out. Kubernetes does way more than keeping our containers running; it proactively watches resource usage of a cluster and carefully schedules pods to the available resources. This type of scheduler-based infrastructure is the key that enables us to run workloads more efficiently than a classical infrastructure.

Optimizing resource utilization

Unsurprisingly, the way in which Kubernetes allocates pods to nodes is based on the supply and demand of resources. If a node can provide a sufficient quantity of resources, the node is eligible to run the pod. Hence, the smaller the difference between the cluster capacity and the...

Elastically scaling

When an application reaches its capacity, the most intuitive way to tackle the problem is by adding more power to the application. However, over provisioning resources to an application is also a situation we want to avoid, and we would like to appropriate any excess resources for other applications. For most applications, scaling out is a more recommended way of resolving insufficient resources than scaling up due to physical hardware limitations. In terms of Kubernetes, from a service owner's point of view, scaling in/out can be as easy as increasing or decreasing the pods of a deployment, and Kubernetes has built-in support for performing such operations automatically, namely, the Horizontal Pod Autoscaler (HPA).

Depending on the infrastructure you're using, you can scale the capacity of the cluster in many different ways. There's an add-on...

Managing cluster resources

As our resource utilization increases, it's more likely to run out of capacity for our cluster. Additionally, when lots of pods dynamically scale in and out independently, predicting the right time to add more resources to the cluster could be extremely difficult. To prevent our cluster from being paralyzed, there are various things we can do.

Resource quotas of namespaces

By default, pods in Kubernetes are resource-unbounded. The running pods might use up all of the computing or storage resources in a cluster. ResourceQuota is a resource object that allows us to restrict the resource consumption that a namespace could use. By setting up the resource limit, we could reduce the noisy neighbor...

Summary

In this chapter, we explored topics surrounding how Kubernetes manages cluster resources and schedules our workloads. With concepts such as Quality of Services, priority, and node out of resource handling in mind, we can optimize our resource utilization while keeping our workloads stable. Meanwhile, ResourceQuota and LimitRange add additional layers of shields to running workloads in a multi-tenant but sharing resources environment. With all of this protection we've built, we can confidently count on Kubernetes to scale our workloads with autoscalers and maximize resource utilization to the limit.

In Chapter 9, Continuous Delivery, we're moving on and setting up a pipeline to deliver our product continuously in Kubernetes.