Kubernetes for Developers

4.4 (5 reviews total)
By Joseph Heck
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Setting Up Kubernetes for Development

About this book

Kubernetes is documented and typically approached from the perspective of someone running software that has already been built. Kubernetes may also be used to enhance the development process, enabling more consistent testing and analysis of code to help developers verify not only its correctness, but also its efficiency. This book introduces key Kubernetes concepts, coupled with examples of how to deploy and use them with a bit of Node.js and Python example code, so that you can quickly replicate and use that knowledge.

You will begin by setting up Kubernetes to help you develop and package your code. We walk you through the setup and installation process before working with Kubernetes in the development environment. We then delve into concepts such as automating your build process, autonomic computing, debugging, and integration testing. This book covers all the concepts required for a developer to work with Kubernetes.

By the end of this book, you will be in a position to use Kubernetes in development ecosystems.

Publication date:
April 2018
Publisher
Packt
Pages
374
ISBN
9781788834759

 

Chapter 1. Setting Up Kubernetes for Development

Welcome to Kubernetes for Developers! This chapter starts off by helping you get the tools installed that will allow you to take advantage of Kubernetes in your development. Once installed, we will interact with those tools a bit to verify that they are functional. Then, we will review some of the basic concepts that you will want to understand to effectively use Kubernetes as a developer. We will cover the following key resources in Kubernetes:

  • Container
  • Pod
  • Node
  • Deployment
  • ReplicaSet
 

What you need for development


In addition to your usual editing and programming tools, you will want to install the software to leverage Kubernetes. The focus of this book is to let you do everything on your local development machine, while also allowing you to expand and leverage a remote Kubernetes cluster in the future if you need more resources. One of Kubernetes' benefits is how it treats one or one hundred computers in the same fashion, allowing you to take advantage of the resources you need for your software, and do it consistently, regardless of where they're located.

The examples in this book will use command-line tools in a Terminal on your local machine. The primary one will be kubectl, which communicates with a Kubernetes cluster. We will use a tiny Kubernetes cluster of a single machine running on your own development system with Minikube. I recommend installing the community edition of Docker, which makes it easy to build containers for use within Kubernetes:

  • kubectl: kubectl(how to pronounce that is an amusing diversion within the Kubernetes community) is the primary command-line tool that is used to work with a Kubernetes cluster. To installkubectl, go to the pagehttps://kubernetes.io/docs/tasks/tools/install-kubectl/and follow the instructions relevant to your platform.

Optional tools

In addition to kubectl, minikube, and docker, you may want to take advantage of additional helpful libraries and command-line tools.

jq is a command-line JSON processor that makes it easy to parse results in more complex data structures. I would describe it as grep's cousin that's better at dealing with JSON results. You can install jq by following the instructions at https://stedolan.github.io/jq/download/. More details on what jq does and how to use it can also be found at https://stedolan.github.io/jq/manual/.

 

Getting a local cluster up and running


Once Minikube and Kubectl are installed, get a cluster up and running. It is worthwhile to know the versions of the tools you're using, as Kubernetes is a fairly fast-moving project, and if you need to get assistance from the community, knowing which versions of these common tools will be important.

The versions of Minikube and kubectl I used while writing this are:

  • Minikube: version 0.22.3
  • kubectl: version 1.8.0

You can check the version of your copy with the following commands:

minikube version

This will output a version:

minikube version: v0.22.3

If you haven't already done so while following the installation instructions, start a Kubernetes with Minikube. The simplest way is using the following command:

minikube start

This will download a virtual machine image and start it, and Kubernetes on it, as a single-machine cluster. The output will look something like the following:

Downloading Minikube ISO
 106.36 MB / 106.36 MB [============================================] 100.00% 0s
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.

Minikube will automatically create the files needed for kubectl to access the cluster and control it. Once this is complete, you can get information about the cluster to verify it is up and running.

First, you can ask minikube about its status directly:

minikube status
minikube: Running
cluster: Running
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.64.2

And if we ask kubectl about its version, it will report both the version of the client and the version of the cluster that it is communicating with:

kubectl version

The first output is the version of the kubectl client:

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T19:32:26Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}

Immediately after, it will communicate and report the version of Kubernetes on your cluster:

Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-09-11T21:52:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

And we can use kubectl to ask for information about the cluster as well:

kubectl cluster-info

And see something akin to the following:

Kubernetes master is running at https://192.168.64.2:8443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

This command primarily lets you know the API server that you're communicating with is up and running. We can ask for the specific status of the key internal components using an additional command:

kubectl get componentstatuses
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
etcd-0               Healthy   {"health": "true"}
controller-manager   Healthy   ok

Kubernetes also reports and stores a number of events that you can request to see. These show what is happening within the cluster:

kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME       KIND      SUBOBJECT   TYPE      REASON                    SOURCE                 MESSAGE
2m         2m          1         minikube   Node                  Normal    Starting                  kubelet, minikube      Starting kubelet.
2m         2m          2         minikube   Node                  Normal    NodeHasSufficientDisk     kubelet, minikube      Node minikube status is now: NodeHasSufficientDisk
2m         2m          2         minikube   Node                  Normal    NodeHasSufficientMemory   kubelet, minikube      Node minikube status is now: NodeHasSufficientMemory
2m         2m          2         minikube   Node                  Normal    NodeHasNoDiskPressure     kubelet, minikube      Node minikube status is now: NodeHasNoDiskPressure
2m         2m          1         minikube   Node                  Normal    NodeAllocatableEnforced   kubelet, minikube      Updated Node Allocatable limit across pods
2m         2m          1         minikube   Node                  Normal    Starting                  kube-proxy, minikube   Starting kube-proxy.
2m         2m          1         minikube   Node                  Normal    RegisteredNode            controllermanager      Node minikube event: Registered Node minikube in NodeController
 

Resetting and restarting your cluster


If you want to wipe out your local Minikube cluster and restart, it is very easy to do so. Issuing a command to delete and then start Minikube will wipe out the environment and reset it to a blank slate:

minikube delete
Deleting local Kubernetes cluster...
Machine deleted.

minikube start
Starting local Kubernetes v1.7.5 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
 

Looking at what's built-in and included with Minikube


With Minikube, you can bring up a web-based dashboard for the Kubernetes cluster with a single command:

minikube dashboard

This will open a browser and show you a web interface to the Kubernetes cluster. If you look at the URL address in the browser window, you'll see that it's pointing to the same IP address that was returned from the kubectl cluster-info command earlier, running on port 30000. The dashboard is running inside Kubernetes, and it is not the only thing that is.

Kubernetes is self-hosting, in that supporting pieces for Kubernetes to function such as the dashboard, DNS, and more, are all run within Kubernetes. You can see the state of all these components by asking about the state of all Pods in the cluster:

kubectl get pods --all-namespaces
NAMESPACE     NAME                          READY     STATUS    RESTARTS   AGE
kube-system   kube-addon-manager-minikube   1/1       Running   0          6m
kube-system   kube-dns-910330662-6pctd      3/3       Running   0          6m
kube-system   kubernetes-dashboard-91nmv    1/1       Running   0          6m

Notice that we used the --all-namespaces option in this command. By default, kubectl will only show you Kubernetes resources that are in the default namespace. Since we haven't run anything ourselves, if we invoked kubectl get pods we would just get an empty list. Pods aren't the only Kubernetes resources through; you can ask about quite a number of different resources, some of which I'll describe later in this chapter, and more in further chapters.

For the moment, invoke one more command to get the list of services:

kubectl get services --all-namespaces

This will output all the services:

NAMESPACE     NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       kubernetes             10.0.0.1     <none>        443/TCP         3m
kube-system   kube-dns               10.0.0.10    <none>        53/UDP,53/TCP   2m
kube-system   kubernetes-dashboard   10.0.0.147   <nodes>       80:30000/TCP    2m

Note the service named kubernetes-dashboard has a Cluster-IP value, and the ports 80:30000. That port configuration is indicating that within the Pods that are backing the kubernetes-dashboard service, it will forward any requests from port 30000 to port 80 within the container. You may have noticed that the IP address for the Cluster IP is very different from the IP address reported for the Kubernetes master that we saw previously in the kubectl cluster-info command.

It is important to know that everything within Kubernetes is run on a private, isolated network that is not normally accessible from outside the cluster. We will get into more detail on this in future chapters. For now, just be aware that minikube has some additional, special configuration within it to expose the dashboard.

 

Verifying Docker


Kubernetes supports multiple ways of running containers, Docker being the most common, and the most convenient. In this book, we will use Docker to help us create images that we will run within Kubernetes.

You can see what version of Docker you have installed and verify it is operational by running the following command:

docker  version

Like kubectl, it will report the docker client version as well as the server version, and your output may look something like the following:

Client:
 Version: 17.09.0-ce
 API version: 1.32
 Go version: go1.8.3
 Git commit: afdb6d4
 Built: Tue Sep 26 22:40:09 2017
 OS/Arch: darwin/amd64
Server:
 Version: 17.09.0-ce
 API version: 1.32 (minimum version 1.12)
 Go version: go1.8.3
 Git commit: afdb6d4
 Built: Tue Sep 26 22:45:38 2017
 OS/Arch: linux/amd64
 Experimental: false

By using the docker images command, you can see what container images are available locally, and using the docker pull command, you can request specific images. In our examples in the next chapter, we will be building upon the alpine container image to host our software, so let's go ahead and pull that image to verify that your environment is working:

docker pull alpine

Using default tag: latest
latest: Pulling from library/alpine
Digest: sha256:f006ecbb824d87947d0b51ab8488634bf69fe4094959d935c0c103f4820a417d
Status: Image is up to date for alpine:latest

You can then see the images using the following command:

docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
alpine latest 76da55c8019d 3 weeks ago 3.97MB</strong>

Note

If you get an error when trying to pull the alpine image, it may mean that you are required to work through a proxy, or otherwise have constrained access to the internet to pull images as you need. You may need to review Docker's information on how to set up and use a proxy if you are in this situation.

 

Clearing and cleaning Docker images


Since we will be using Docker to build container images, it will be useful to know how to get rid of images. You have already seen the list of images with the docker image command. There are also intermediate images that are maintained by Docker that are hidden in that output. To see all the images that Docker is storing, use the following command:

docker images -a

If you have only pulled the alpine image as per the preceding text, you likely won't see any additional images, but as you build images in the next chapter, this list will grow.

You can remove images with the docker rmi command followed by the name of the image. By default, Docker will attempt to maintain images that containers have used recently or referenced. Because of this, you may need to force the removal to clean up the images.

If you want to reset and remove all the images and start afresh, there is a handy command that will do that. By tying together Docker images and docker rmi, we can ask it to force remove all the images it knows about:

docker rmi -f $(docker images -a -q)
 

Kubernetes concept – container


Kubernetes (and other technologies in this space) are all about managing and orchestrating containers. A container is really a name wrapped around a set of Linux technologies, the two most prominent being the container image format and the way Linux can isolate processes from one another, leveraging cgroups.

For all practical purposes, when someone is speaking of a container, they are generally implying that there is an image with everything needed to run a single process. In this context, a container is not only the image, but also the information about what to invoke and how to run it. Containers also act like they have their own network access. In reality, it's being shared by the Linux operating system that's running the containers.

When we want to write code to run under Kubernetes, we will always be talking about packaging it up and preparing it to run within a container. The more complex examples later in the book will utilize multiple containers all working together.

Note

It is quite possible to run more than a single process inside a container, but that's generally frowned upon as a container is ideally suited to represent a single process and how to invoke it, and shouldn't be considered the same thing as a full virtual machine.

If you usually develop in Python, then you are likely familiar with using something like pip to download libraries and modules that you need, and you invoke your program with a command akin to python your_file. If you're a Node developer, then it is more likely you're familiar with npm or yarn to install the dependencies you need, and you run your code with node your_file.

If you wanted to wrap that all up and run it on another machine, you would likely either redo all the instructions for downloading the libraries and running the code, or perhaps ZIP up the whole directory and move it where you want to run it. A container is a way to collect all the information together into a single image so that it can be easily moved around, installed, and run on a Linux operating system. Originally created by Docker, the specifications are now maintained by the Open Container Initiative (OCI) (https://www.opencontainers.org).

While a container is the smallest building block of what goes into Kubernetes, the smallest unit that Kubernetes works with is a Pod.

 

Kubernetes resource – Pod


A Pod is the smallest unit that Kubernetes manages and is the fundamental unit that the rest of the system is built on. The team that created Kubernetes found it worthwhile to let a developer specify what processes should always be run together on the same OS, and that the combination of processes running together should be the unit that's scheduled, run, and managed.

Earlier in this chapter, you saw that a basic instance of Kubernetes has some of its software running in Pods. Much of Kubernetes is run using these same concepts and abstractions, allowing Kubernetes to self-host its own software. Some of the software to run a Kubernetes cluster is managed outside the cluster itself, but more and more leverage the concept of Pods, including the DNS services, dashboard, and controller manager, which coordinate all the control operations through Kubernetes.

A Pod is made up of one or more containers and information associated with those containers. When you ask Kubernetes about a Pod, it will return a data structure that includes a list of one or more containers, along with a variety of metadata that Kubernetes uses to coordinate the Pod with other Pods, and policies of how Kubernetes should act and react if the program fails, is asked to be restarted, and so forth. The metadata can also define things such as affinity, which influences where a Pod can be scheduled in a cluster, expectations around how to get the container images, and more. It is important to know that a Pod is not intended to be treated as a durable, long-lived entity.

They are created and destroyed and essentially meant to be ephemeral. This allows separate logic—contained in controllers - to manage responsibilities such as scale and availability. It is this separation of duties that enables Kubernetes to provide a means for self-healing in the event of failures, and provide some auto-scaling capabilities.

A Pod being run by Kubernetes has a few specific guarantees:

  • All the containers for a Pod will be run on the same Node
  • Any container running within a Pod will share the Node's network with any other containers in the same Pod
  • Containers within a Pod can share files through volumes, attached to the containers
  • A Pod has an explicit life cycle, and will always remain on the Node in which it was started

For all practical purposes, when you want to know what's running on a Kubernetes cluster, you are generally going to want to know about the Pods running within Kubernetes and their state.

Kubernetes maintains and reports on the Pod's status, as well as the state of each of the containers that make up the Pod. The states for a container are Running, Terminated, and Waiting. The life cycle of a Pod is a bit more complicated, consisting of a strictly defined Phase and a set of PodStatus. Phase is one of Pending, Running, Succeeded, Failed, or Unknown, and the specific details of what's included in a Phase is documented at https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase.

A Pod can also contain Probes, which actively check the container for some status information. Two common probes that are deployed and used by Kubernetes controllers are a livenessProbe and a readinessProbe. The livenessProbe defines whether the container is up and running. If it isn't, the infrastructure in Kubernetes kills the relevant container and then applies the restart policy defined for the Pod. The readinessProbe is meant to indicate whether the container is ready to service requests. The results of the readinessProbe are used in conjunction with other Kubernetes mechanisms such as services (which we will detail later) to forward traffic to the relevant container. In general, the probes are set up to allow the software in a container to provide a feedback loop to Kubernetes. You can find more detail on Probes, how to define them, and how they are used at https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes. We will dig into probes in detail in a future chapter.

Namespaces

Pods are collected into namespaces, which are used to group Pods together for a variety of purposes. You already saw one example of namespaces when we asked for the status of all the Pods in the cluster with the --all-namespaces option earlier.

Namespaces can be used to provide quotas and limits around resource usage, have an impact on DNS names that Kubernetes creates internal to the cluster, and in the future may impact access control policies. If no namespace is specified when interacting with Kubernetes through kubectl, the command assumes you are working with the default namespace, named default.

Writing your code for Pods and Containers

One of the keys to successfully using Kubernetes is to consider how you want your code to operate, and to structure it so that it fits cleanly into a structure of Pods and Containers. By structuring your software solutions to break problems down into components that operate with the constraints and guarantees that Kubernetes provides, you can easily take advantage of parallelism and container orchestration to use many machines as seamlessly as you would use a single machine.

The guarantees and abstractions that Kubernetes provides are reflective of years of experience that Google (and others) have had in running their software and services at a massive scale, reliably, and redundantly, leveraging the pattern of horizontal scaling to tackle massive problems.

 

Kubernetes resource – Node


A Node is a machine, typically running Linux, that has been added to the Kubernetes cluster. It can be a physical machine or a virtual machine. In the case of minikube, it is a single virtual machine that is running all the software for Kubernetes. In larger Kubernetes clusters, you may have one or several machines dedicated to just managing the cluster and separate machines where your workloads run. Kubernetes manages its resources across Nodes by tracking their resource usage, scheduling, starting (and if needed, restarting) Pods, as well as coordinating the other mechanisms that connect Pods together or expose them outside the cluster.

Nodes can (and do) have metadata associated with them so that Kubernetes can be aware of relevant differences, and can account for those differences when scheduling and running Pods. Kubernetes can support a wide variety of machines working together, and run software efficiently across all of them, or limit scheduling Pods to only machines that have the required resources (for example, a GPU).

Networks

We previously mentioned that all the containers in a Pod share the Node's network. In addition, all Nodes in a Kubernetes cluster are expected to be connected to each other and share a private cluster-wide network. When Kubernetes runs containers within a Pod, it does so within this isolated network. Kubernetes is responsible for handling IP addresses, creating DNS entries, and making sure that a Pod can communicate with another Pod in the same Kubernetes cluster.

Another resource, Services, which we will dig into later, is what Kubernetes uses to expose Pods to one another over this private network or handle connections in and out of the cluster. By default, a Pod running in this private, isolated network is not exposed outside of the Kubernetes cluster. Depending on how your Kubernetes cluster was created, there are multiple avenues for opening up access to your software from outside the cluster, which we'll detail later with Services that include LoadBalancer, NodePort, and Ingress.

Controllers

Kubernetes is built with the notion that you tell it what you want, and it knows how to do it. When you interact with Kubernetes, you are asserting you want one or more resources to be in a certain state, with specific versions, and so forth. Controllers are where the brains exist for tracking those resources and attempting to run your software as you described. These descriptions can include how many copies of a container image are running, updating the software version running within a Pod, and handling the case of a Node failure where you unexpectedly lose part of your cluster.

There are a variety of controllers used within Kubernetes, and they are mostly hidden behind two key resources that we will dig into further: Deployments and ReplicaSets.

 

Kubernetes resource – ReplicaSet


A ReplicaSet wraps Pods, defining how many need to run in parallel. A ReplicaSet is commonly wrapped in turn by a deployment. ReplicaSets are not often used directly, but are critical to represent horizontal scaling—to represent the number of parallel Pods to run.

A ReplicaSet is associated with a Pod and indicates how many instances of that Pod should be running within the cluster. A ReplicaSet also implies that Kubernetes has a controller that watches the ongoing state and knows how many of your Pod to keep running. This is where Kubernetes is really starting to do work for you, if you specified three Pods in a ReplicaSet and one fails, Kubernetes will automatically schedule and run another Pod for you.

 

Kubernetes resource – Deployment


The most common and recommended way to run code on Kubernetes is with a deployment, which is managed by a deployment controller. We will explore deployments in the next and further chapters, both specifying them directly and creating them implicitly with commands such as kubectl run.

A Pod by itself is interesting, but limited, specifically because it is intended to be ephemeral. If a Node were to die (or get powered down), all the Pods on that Node would stop running. ReplicaSets provide self-healing capabilities. The work within the cluster to recognize when a Pod is no longer available and will attempt to schedule another Pod, typically to bring a service back online, or otherwise continue doing work.

The deployment controller wraps around and extends the ReplicaSet controller, and is primarily responsible for rolling out software updates and managing the process of that rollout when you update your deployment resource with new versions of your software. The deployment controller includes metadata settings to know how many Pods to keep running so that you can enable a seamless rolling update of your software by adding new versions of a container, and stopping old versions when you request it.

Representing Kubernetes resources

Kubernetes resources can generally be represented as either a JSON or YAML data structure. Kubernetes is specifically built so that you can save these files, and when you want to run your software, you can use a command such askubectl deployand provide the definitions you've created previously, and it uses that to run your software. In our next chapter, we will start to show specific examples of these resources and build them up for our use.

As we get into the examples in the next, and future chapters, we will use YAML to describe our resources and request data through kubectl back in JSON format. All of these data structures are formally defined for each version of Kubernetes, along with the REST APIs that Kubernetes provides to manipulate them. The formal definitions of all Kubernetes resources are maintained with OpenAPI (also known as Swagger) in source code control, and can be viewed at https://github.com/kubernetes/kubernetes/tree/master/api/swagger-spec.

 

Summary


In this chapter, we installed minikube and kubectl, and used them to start a local Kubernetes cluster and briefly interact with it. We then walked through some of the key concepts that we will be using and exploring more in depth in future chapters, including container, Pod, node, deployment, and ReplicaSet.

In the next chapter, we will dive into what it takes to get your software into a container and tips for how to set that up within your own project. We will walk through an example in Python, and another in Node.js, which you can use as starting points for your own code.

About the Author

  • Joseph Heck

    Joseph Heck has broad development and management experience across start-ups and large companies. He has architected, developed, and deployed a wide variety of solutions, ranging from mobile and desktop applications to cloud-based distributed systems.

    He builds and directs teams and mentors individuals to improve the way they build, validate, deploy, and run software. He also works extensively with and in open source, collaborating across many projects, including Kubernetes.

    Browse publications by this author

Latest Reviews

(5 reviews total)
Good book on subject. Helps with implementing.
Just started reading this - so far it's good.
自分が知りたかった運用周りのことが丁寧に解説されていた。

Recommended For You

Book Title
Unlock this book and the full library for FREE
Start free trial