Getting Started with Kubernetes

4.1 (7 reviews total)
By Jonathan Baier
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Kubernetes is the tool that’s pushing the containerization revolution – largely driven by Docker – to another level. If Docker has paved the way for greater agility and control in the way we organize and manage our infrastructure, Kubernetes goes further, by helping you to orchestrate and automate container deployments on a massive scale. Kubernetes really does think big – and it’s time you did too!

This book will show you how to start doing exactly that, showing you how to extend the opportunities that containerization innovations have brought about in new and even more effective ways. Get started with the basics - explore the fundamental elements of Kubernetes and find out how to install it on your system, before digging a little deeper into Kubernetes core constructs. Find out how to use Kubernetes pods, services, replication controllers, and labels to manage your clusters effectively and learn how to handle networking with Kubernetes.

Once you’ve got to grips with these core components, you’ll begin to see how Kubernetes fits into your workflow. From basic updates to integrating Kubernetes with continuous delivery tools such as Jenkins and Gulp, the book demonstrates exactly how Kubernetes will transform the way you work. With further insights on how to install monitoring and security tools, this book provides you with a direct route through Kubernetes – so you can take advantage of it, fast!

Publication date:
December 2015
Publisher
Packt
Pages
186
ISBN
9781784394035

 

Chapter 1. Kubernetes and Container Operations

This chapter will give a brief overview of containers and how they work as well as why management and orchestration is important to your business and/or project team. The chapter will also give a brief overview of how Kubernetes orchestration can enhance our container management strategy and how we can get a basic Kubernetes cluster up, running, and ready for container deployments.

This chapter will include the following topics:

  • Introducing container operations and management

  • Why container management is important

  • Advantages of Kubernetes

  • Downloading the latest Kubernetes

  • Installing and starting up a new Kubernetes cluster

 

A brief overview of containers


Over the past two years, containers have grown in popularity like wildfire. You would be hard-pressed to attend an IT conference without finding popular sessions on Docker or containers in general.

Docker lies at the heart of the mass adoption and the excitement in the container space. As Malcom Mclean revolutionized the physical shipping world in 1957 by creating a standardized shipping container, which is used today for everything from ice cube trays to automobiles1, Linux containers are revolutionizing the software development world by making application environments portable and consistent across the infrastructure landscape. As an organization, Docker has taken the existing container technology to a new level by making it easy to implement and replicate across environments and providers.

What is a container?

At the core of container technology are cGroups and namespaces. Additionally, Docker uses union file systems for added benefits to the container development process.

Control groups (cGroups) work by allowing the host to share and also limit the resources each process or container can consume. This is important for both, resource utilization and security, as it prevents denial-of-service attacks on the host's hardware resources. Several containers can share CPU and memory while staying within the predefined constraints.

Namespaces offer another form of isolation in the way of processes. Processes are limited to see only the process ID in the same namespace. Namespaces from other system processes would not be accessible from a container process. For example, a network namespace would isolate access to the network interfaces and configuration, which allows the separation of network interfaces, routes, and firewall rules.

Figure 1.1. Composition of a container

Union file systems are also a key advantage to using Docker containers. The easiest way to understand union file systems is to think of them like a layer cake with each layer baked independently. The Linux kernel is our base layer; then, we might add an OS like Red Hat Linux or Ubuntu. Next, we might add an application like Nginx or Apache. Every change creates a new layer. Finally, as you make changes and new layers are added, you'll always have a top layer (think frosting) that is a writable layer.

Figure 1.2. Layered file system

What makes this truly efficient is that Docker caches the layers the first time we build them. So, let's say that we have an image with Ubuntu and then add Apache and build the image. Next, we build MySQL with Ubuntu as the base. The second build will be much faster because the Ubuntu layer is already cached. Essentially, our chocolate and vanilla layers, from Figure 1.2, are already baked. We simply need to bake the pistachio (MySQL) layer, assemble, and add the icing (writable layer).

 

Why are containers so cool?


Containers on their own are not a new technology and have in fact been around for many years. What truly sets Docker apart is the tooling and ease of use they have brought to community.

Advantages to Continuous Integration/Continuous Deployment

Wikipedia defines Continuous Integration as "the practice, in software engineering, of merging all developer working copies to a shared mainline several times a day." By having a continuous process of building and deploying code organizations are able to instill quality control and testing as part of the everyday work cycle. The result is that updates and bug fixes happen much faster and overall quality improves.

However, there has always been a challenge in setting development environments to match that of testing and production. Often inconsistencies in these environments make it difficult to gain the full advantage of continuous delivery.

Using Docker, developers are now able to have truly portable deployments. Containers that are deployed on a developer's laptop are easily deployed on an in-house staging server. They are then easily transferred to the production server running in the cloud. This is because Docker builds containers up with build files that specify parent layers. One advantage of this is that it becomes very easy to ensure OS, package, and application versions are the same across development, staging, and production environments.

Because all the dependencies are packaged into the layer, the same host server can have multiple containers running a variety of OS or package versions. Further, we can have various languages and frameworks on the same host server without the typical dependency clashes we would get in a Virtual Machine (VM) with a single operating system.

Resource utilization

The well-defined isolation and layer filesystem also make containers ideal for running systems with a very small footprint and domain-specific purposes. A streamlined deployment and release process means we can deploy quickly and often. As such, many companies have reduced their deployment time from weeks or months to days and hours in some cases. This development life cycle lends itself extremely well to small, targeted teams working on small chunks of a larger application.

 

Microservices and orchestration


As we break down an application into very specific domains, we need a uniform way to communicate between all the various pieces and domains. Web services have served this purpose for years, but the added isolation and granular focus that containers bring have paved a way for what is being named microservices.

The definition for microservices can be a bit nebulous, but a definition from Martin Fowler, a respected author and speaker on software development, says2:

"In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies."

As the pivot to containerization and microservices evolves in an organization, they will soon need a strategy to maintain many containers and microservices. Some organizations will have hundreds or even thousands of containers running in the years ahead.

Future challenges

Life cycle processes alone are an important piece of operations and management. How will we automatically recover when a container fails? Which upstream services are affected by such an outage? How will we patch our applications with minimal downtime? How will we scale up our containers and services as our traffic grows?

Networking and processing are also important concerns. Some processes are part of the same service and may benefit from proximity on the network. Databases, for example, may send large amounts of data to a particular microservice for processing. How will we place containers near each other in our cluster? Is there common data that needs to be accessed? How will new services be discovered and made available to other systems?

Resource utilization is also a key. The small footprint of containers means that we can optimize our infrastructure for greater utilization. Extending the savings started in the elastic cloud world even further towards minimizing wasted hardware. How will we schedule workloads most efficiently? How will we ensure that our important applications always have the resources? How can we run less important workloads on spare capacity?

Finally, portability is a key factor in moving many organizations to containerization. Docker makes it very easy to deploy a standard container across various operating systems, cloud providers, and on-premise hardware, or even developer laptops. However, we still need tooling to move containers around. How will we move containers between different nodes on our cluster? How will we roll out updates with minimal disruption? What process do we use to perform blue-green deployments or canary releases?

Whether you are starting to build out individual microservices and separating concerns into isolated containers or if you simply want to take full advantage of the portability and immutability in your application development, the need for management and orchestration becomes clear.

 

Advantages of Kubernetes


This is where orchestration tools such as Kubernetes offer the biggest value. Kubernetes (K8s) is an open source project that was released by Google in June, 2014. Google released the project as part of an effort to share their own infrastructure and technology advantage with the community at large.

Google launches 2 billion containers a week in their infrastructure and has been using container technology for over a decade. Originally they were building a system named Borg, and now Omega, to schedule their vast quantities of workloads across their ever-expanding data center footprint. They took many of the lessons they learned over the years and rewrote their existing data center management tool for wide adoption by the rest of the world. The result was the Kubernetes open source project3.

Since its initial release in 2014, K8s has undergone rapid development with contributions all across the open source community, including Red Hat, VMware, and Canonical. The 1.0 release of Kubernetes went live in July, 2015. We'll be covering version 1.0 throughout the book. K8s gives organizations a tool to deal with some of the major operations and management concerns. We will explore how Kubernetes helps deal with resource utilization, high availability, updates, patching, networking, service discovery, monitoring, and logging.

 

Our first cluster


Kubernetes is supported on a variety of platforms and OSes. For the examples in this book, I used an Ubuntu 14.04 Linux VirtualBox for my client and Google Compute Engine (GCE) with Debian for the cluster itself. We will also take a brief look at a cluster running on Amazon Web Services (AWS) with Ubuntu.

Tip

Most of the concepts and examples in this book should work on any installation of a Kubernetes cluster. To get more information on other platform setups, check the Kubernetes getting started page on the following GitHub link:

https://github.com/GoogleCloudPlatform/kubernetes/blob/v1.0.0/docs/getting-started-guides/README.md

First, let's make sure that our environment is properly set up before we install Kubernetes.

Start by updating packages:

$ sudo apt-get update

Install Python and curl if they are not present:

$ sudo apt-get install python
$ sudo apt-get install curl

Install the gcloud SDK:

$ curl https://sdk.cloud.google.com | bash

Tip

We will need to start a new shell before gcloud is on our path.

Configure your Google Cloud Platform (GCP) account information. This should automatically open a browser where we can log in to our Google Cloud account and authorize the SDK:

$ gcloud auth login

Tip

If you have problems with login or want to use another browser, you can optionally use the --no-launch-browser command. Copy and paste the URL to the machine and/or browser of your choice. Log in with your Google Cloud credentials and click on Allow on the permissions page. Finally, you should receive an authorization code that you can copy and paste back into the shell where the prompt is waiting.

A default project should be set, but we can check this with the following:

$ gcloud config list project

We can modify this and set a new default project with this command. Make sure to use project ID and not project name, as follows:

$ gcloud config set project <PROJECT ID>

Tip

We can find our project ID in the console at:

https://console.developers.google.com/project

Alternatively, we can list active projects:

$ gcloud alpha projects list

Now that we have our environment set up, installing the latest Kubernetes version is done in a single step as follows:

$ curl -sS https://get.k8s.io | bash

It may take a minute or two to download Kubernetes depending on your connection speed. After this, it will automatically call the kube-up.sh script and start building our cluster. By default, it will use the Google Cloud and GCE.

Tip

If something fails during the cluster setup and you need to start again, you can simply run the kube-up.sh script. Go to the folder where you ran the previous curl command. Then, you can kick off the cluster build with the following command:

$ kubernetes/cluster/kube-up.sh

After Kubernetes is downloaded and the kube-up.sh script has started, we will see quite a few lines roll past. Let's take a look at them one section at a time.

Figure 1.3. GCE prerequisite check

Tip

If your gcloud components are not up to date, you may be prompted to update.

The preceding section (Figure 1.3) shows the checks for prerequisites as well as makes sure that all components are up to date. This is specific to each provider. In the case of GCE, it will check that the SDK is installed and that all components are up to date. If not, you will see a prompt at this point to install or update.

Figure 1.4. Upload cluster packages

Now the script is turning up the cluster. Again, this is specific to the provider. For GCE, it first checks to make sure that the SDK is configured for a default project and zone. If they are set, you'll see those in the output.

Next, it uploads the server binaries to Google Cloud storage, as seen in the Creating gs:\\... lines.

Figure 1.5. Master creation

It then checks for any pieces of a cluster already running. Then, we finally start creating the cluster. In the output in Figure 1.5, we see it creating the master server, IP address, and appropriate firewall configurations for the cluster.

Figure 1.6. Minion creation

Finally, it creates the minions or nodes for our cluster. This is where our container workloads will actually run. It will continually loop and wait while all the minions start up. By default, the cluster will have four node (minions), but K8s supports having upwards of 100 (and soon beyond 1000). We will come back to scaling the nodes later on in the book.

Figure 1.7. Cluster completion

Now that everything is created, the cluster is initialized and started. Assuming that everything goes well, we will get an IP address for the master. Also, note that configuration along with the cluster management credentials are stored in home/<Username>/.kube/config.

Figure 1.8. Cluster validation

Then, the script will validate the cluster. At this point, we are no longer running provider-specific code. The validation script will query the cluster via the kubectl.sh script. This is the central script for managing our cluster. In this case, it checks the number of minions found, registered, and in a ready state. It loops through giving the cluster up to 10 minutes to finish initialization.

After a successful startup, a summary of the minions and the cluster component health is printed to the screen:

Figure 1.9. Cluster summary

Finally, a kubectl cluster-info command is run, which outputs the URL for the master services as well as DNS, UI, and monitoring. Let's take a look at some of these components.

Kubernetes UI

Open a browser and try the following code:

https://<your master ip>/api/v1/proxy/namespaces/kube-system/services/kube-ui

The certificate is self-signed by default, so you'll need to ignore the warnings in your browser before proceeding. After this, we will see a login dialog. This is where we use the credentials listed during the K8s installation. We can find them at any time by simply using the config command:

$ kubectl config view

Now that we have credentials for login, use those, and we should see a dashboard like the following image:

Figure 1.10. Kubernetes UI dashboard

The main dashboard page gives us a summary of the minions (or slave nodes). We can also see the CPU, memory, and used disk space on each minion as well the IP address.

The UI has a number of built-in views listed under the Views dropdown menu on the top right of the screen. However, most of them will be empty by default. Once workloads and services are spun up, these views will become a lot more interesting.

Grafana

Another service installed by default is Grafana. This tool will give us a dashboard to view metrics on the cluster nodes. We can access it by using the following syntax in a browser:

https://<your master ip>/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana

Figure 1.11. Kubernetes Grafana dashboard

Here, Kubernetes is actually running a number of services. Heapster is used to collect resource usage on the pods and nodes and stores the information in InfluxDB. The results, like CPU and memory usage, are what we see in the Grafana UI. We will explore this in depth in Chapter 6, Monitoring and Logging.

Swagger

Swagger (http://swagger.io/) is a tool to add a higher level of interaction and easy discovery to an API.

Kubernetes has built a Swagger-enabled API, which can be accessed by using https://<your master ip>/swagger-ui/.

Figure 1.12. Kubernetes Swagger dashboard

Through this interface, you can learn a lot about the Kubernetes RESTful API. The bulk of the interesting endpoints are listed under v1. If we look at /api/v1/nodes, we can see the structure of the JSON response as well as details of possible parameters for the request. In this case, we see that the first parameter is pretty, which toggles whether the JSON is returned with pretty indentation for easier reading.

We can try this out by using https://<your master ip>/api/v1/nodes/.

By default, we'll see a JSON response with pretty indentation enabled. The response should have a list of all the nodes currently in our cluster.

Now, let's try tweaking the pretty request parameter you just learned about. Use https://<your master ip>/api/v1/nodes/?pretty=false.

Now we have the same response output, but with no indentation. This is a great resource for exploring the API and learning how to use various function calls to get more information and interact with your cluster programmatically.

Command line

The kubectl.sh script has commands to explore our cluster and the workloads running on it. We will be using this command throughout the book, so let's take a second to set up our environment. We can do so by making the script executable and putting it on our PATH, in the following manner:

$ cd /home/<Username>/kubernetes/cluster
$ chmod +x kubectl.sh
$ export PATH=$PATH:/home/<Username>/kubernetes/cluster
$ ln -s kubectl.sh kubectl

Tip

You may choose to download the kubernetes folder outside your home folder, so modify the preceding commands as appropriate.

It is also a good idea to make the changes permanent by adding the export command to the end of your .bashrc file in your home directory.

Now that we have kubectl on our path, we can start working with it. It has quite a few commands. Since we have not spun up any applications yet, most of these commands will not be very interesting. However, we can explore with two commands right away.

First, we have already seen the cluster-info command during initialization, but we can run it again at any time with the following:

$ kubectl cluster-info

Another useful command is get. The get command can be used to see currently running services, pods, replication controllers, and a lot more. Here are the three examples that are useful right out of the gate:

  • Listing the nodes in our cluster:

    $ kubectl get nodes
    
  • List cluster events:

    $ kubectl get events
    
  • Finally, we can see any services that are running in the cluster as follows:

    $ kubectl get services
    

To start with, we will only see one service, named kubernetes. This service is the core API server, monitoring and logging services for the pods and cluster.

Services running on the master

Let's dig a little bit deeper into our new cluster and its core services. By default, machines are named with the kubernetes- prefix. We can modify this using $KUBE_GCE_INSTANCE_PREFIX before a cluster is spun up. For the cluster we just started, the master should be named kubernetes-master. We can use the gcloud command-line utility to SSH into the machine. The following command will start an SSH session with the master node. Be sure to substitute your project ID and zone to match your environment. Also, note that you can launch SSH from the Google Cloud console using the following syntax:

$ gcloud compute --project "<Your project ID>" ssh --zone "<your gce zone>" "kubernetes-master"

Once we are logged in, we should get a standard shell prompt. Let's run the familiar sudo docker ps command.

Figure 1.13. Master container listing

Even though we have not deployed any applications on Kubernetes yet, we note that there are several containers already running. The following is a brief description of each container:

  • fluentd-gcp: This container collects and sends the cluster logs file to the Google Cloud Logging service.

  • kube-ui: This is the UI that we saw earlier.

  • kube-controller-manager: The controller manager controls a variety of cluster functions. Ensuring accurate and up-to-date replication is one of its vital roles. Additionally, it monitors, manages, and discovers new nodes. Finally, it manages and updates service endpoints.

  • kube-apiserver: This container runs the API server. As we explored in the Swagger interface, this RESTful API allows us to create, query, update, and remove various components of our Kubernetes cluster.

  • kube-scheduler: The scheduler takes unscheduled pods and binds them to nodes based on the current scheduling algorithm.

  • etcd: This runs the etcd software built by CoreOS. etcd is a distributed and consistent key-value store. This is where the Kubernetes cluster state is stored, updated, and retrieved by various components of K8s.

  • pause: The Pause container is often referred to as the pod infrastructure container and is used to set up and hold the networking namespace and resource limits for each pod.

Note

Figure 2.1 in the next chapter will also show how a few of these services work together.

To exit the SSH session, simply type exit at the prompt.

Services running on the minions

We could SSH to one of the minions, but since Kubernetes schedules workloads across the cluster, we would not see all the containers on a single minion. However, we can look at the pods running on all the minions using the kubectl command:

$ kubectl get pods

Since we have not started any applications on the cluster yet, we don't see any pods. However, there are actually several system pods running pieces of the Kubernetes infrastructure. We can see these pods by specifying the kube-system namespace. We will explore namespaces and their significance later, but for now, the --namespace=kube-system command can be used to look at these K8s system resources as follows:

$ kubectl get pods --namespace=kube-system

We should see something similar to the following:

etcd-server
fluentd-cloud-logging
kube-apiserver
kube-controller-manager
kube-scheduler
kube-ui
kube-dns
monitoring-heapster
monitoring-influx-grafana

The first six should look familiar. These are additional pieces of the services we saw running on the master. The final three are services we have not seen yet. kube-dns provides the DNS and service discovery plumbing. monitoring-heapster is the system used to monitor resource usage across the cluster. monitoring-influx-grafana provides the database and user interface we saw earlier for monitoring the infrastructure.

If we did SSH into a random minion, we would see several containers that run across a few of these pods. A sample might look like the image here:

Figure 1.14. Minion container listing

Again, we saw a similar line up of services on the master. The services we did not see on the master include the following:

  • skydns: This uses DNS to provide a distributed service discovery utility that works with etcd.

  • kube2Sky: This is the connector between skydns and kubernetes. Services in the API are monitored for changes and updated in skydns appropriately.

  • heapster: This does resource usage and monitoring.

  • exechealthz: This performs health checks on the pods.

Tear down cluster

OK, this is our first cluster on GCE, but let's explore some other providers. To keep things simple, we need to remove the one we just created on GCE. We can tear down the cluster with one simple command:

$ kube-down.sh
 

Working with other providers


By default, Kubernetes uses the GCE provider for Google Cloud. We can override this default by setting the KUBERNETES_PROVIDER environment variable. The following providers are supported with values listed in Table 1.1:

Provider

KUBERNETES_PROVIDER value

Type

Google Compute Engine

gce

Public cloud

Google Container Engine

gke

Public cloud

Amazon Web Services

aws

Public cloud

Microsoft Azure

azure

Public cloud

Hashicorp Vagrant

vagrant

Virtual development environment

VMware vSphere

vsphere

Private cloud / on-premise virtualization

Libvirt running CoreOS

libvirt-coreos

Virtualization management tool

Canonical Juju (folks behind Ubuntu)

juju

OS service orchestration tool

Table 1.1. Kubernetes providers

Let's try setting up the cluster on AWS. As a prerequisite, we need to have the AWS Command Line Interface (CLI) installed and configured for our account. AWS CLI Installation and configuration documentation can be found here:

Then, it is a simple environment variable setting as follows:

$ export KUBERNETES_PROVIDER=aws

Again, we can use the kube-up.sh command to spin up the cluster as follows:

$ kube-up.sh

As with GCE, the setup activity will take a few minutes. It will stage files in S3, create the appropriate instances, Virtual Private Cloud (VPC), security groups, and so on in our AWS account. Then, the Kubernetes cluster will be set up and started. Once everything is finished and started, we should see the cluster validation at the end of the output.

Figure 1.15. AWS cluster validation

Once again, we will SSH into master. This time, we can use the native SSH client. We'll find the key files in /home/<username>/.ssh:

$ ssh -v -i /home/<username>/.ssh/kube_aws_rsa [email protected]<Your master IP>

We'll use sudo docker ps to explore the running containers. We should see something like the following:

Figure 1.16. Master container listing (AWS)

For the most part, we see the same containers as our GCE cluster had. However, instead of fluentd-gcp service, we see fluentd-elasticsearch.

On the AWS provider, Elasticsearch and Kibana are set up for us. We can find the Kibana UI by using the following syntax as URL:

https://<your master ip>/api/v1/proxy/namespaces/kube-system/services/kibana-logging/#/discover

Figure 1.17. Kubernetes Kibana dashboard

Resetting the cluster

That is a little taste of running the cluster on AWS. For the remainder of the book, I will be basing my examples on a GCE cluster. For the best experience following along, you can get back to a GCE cluster easily.

Simply tear down the AWS cluster as follows:

$ kube-down.sh

Then, create a GCE cluster again using following:

$ export KUBERNETES_PROVIDER=gce
$ kube-up.sh
 

Summary


We took a very brief look at how containers work and how they lend themselves to the new architecture patterns in microservices. You should now have a better understanding of how these two forces will require a variety of operations and management tasks and how Kubernetes offers strong features to address these challenges. Finally, we created two different clusters on both GCE and AWS and explored the startup script as well as some of the built-in features of Kubernetes.

In the next chapter, we will explore the core concept and abstractions K8s provides to manage containers and full application stacks. We will also look at basic scheduling, service discovery, and health checking.

Footnotes

1Malcom McLean entry on Wikipedia: https://en.wikipedia.org/wiki/Malcom_McLean

2Martin Fowler on microservices: http://martinfowler.com/articles/microservices.html

3Kubernetes GitHub project page: https://github.com/kubernetes/kubernetes

About the Author

  • Jonathan Baier

    Jonathan Baier is an emerging technology leader living in Brooklyn, New York. He has had a passion for technology since an early age. When he was 14 years old, he was so interested in the family computer (an IBM PCjr) that he pored over the several hundred pages of BASIC and DOS manuals. Then, he taught himself to code a very poorly-written version of Tic-Tac-Toe. During his teenage years, he started a computer support business. Throughout his life, he has dabbled in entrepreneurship. He currently works as Senior Vice President of Cloud Engineering and Operations for Moody's corporation in New York.

    Browse publications by this author

Latest Reviews

(7 reviews total)
Good
Well written and easy to follow. Great Getting Started.
The book is good, I just wish there was a section if you wanted to manually create the kubernetes setup without using the script. Then move on to do the following exercises in the book.
Book Title
Access this book, plus 7,500 other titles for FREE
Access now