At the beginning, Docker was created as an internal tool by a Platform as a Service company called dotCloud. Later on, in March 2013, it was released as open source. Apart from the Docker Inc. team, which is the main sponsor, there are some other big names contributing to the tool—Red Hat, IBM, Microsoft, Google, and Cisco Systems, just to name a few. Software development today needs to be agile and react quickly to changes. We use methodologies such as Scrum, estimate our work in story points, and attend the daily stand-ups. But what about preparing our software for shipment and the deployment? Let's see how Docker fits into that scenario and can help us to be agile.
In this chapter, we will cover the following topics:
The basic idea behind Docker
A difference between virtualization and containerization
Benefits of using Docker
Components available to install
We will begin with the basic idea behind this wonderful tool.
The basic idea behind Docker is to pack an application with all of its dependencies (let it be binaries, libraries, configuration files, scripts, jars, and so on) into a single, standardized unit for software development and deployment. Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, and system libraries-anything you can install on a server. This guarantees that it will always run in the same way, no matter what environment it will be deployed in. With Docker, you can build a Node.js or Java project (but you are of course not limited to those two) without having to install Node.js or Java on your host machine. Once you're done with it, you can just destroy the Docker image, and it's as though nothing ever happened. It's not a programming language or a framework; rather, think of it as a tool that helps solve common problems such as installing, distributing, and managing the software. It allows programmers and DevOps to build, ship, and run their code anywhere.
You may think that Docker is a virtualization engine, but it's far from it as we will explain in a while.
To fully understand what Docker really is, first we need to understand the difference between traditional virtualization and containerization. Let's compare those two technologies now.
A traditional virtual machine, which represents the hardware-level virtualization, is basically a complete operating system running on top of the host operating system. There are two types of virtualization hypervisor: Type 1 and Type 2. Type 1 hypervisors provide server virtualization on bare metal hardware—there is no traditional end user's operating system. Type 2 hypervisors, on the other hand, are commonly used as a desktop virtualization—you run the virtualization engine on top of your own operating system. There are a lot of use cases that would take advantage of using virtualization—the biggest asset is that you can run many virtual machines with totally different operating systems on a single host.
Virtual machines are fully isolated, hence very secure. But nothing comes without a price. There are many drawbacks—they contain all the features that an operating system needs to have: device drivers, core system libraries, and so on. They are heavyweight, usually resource-hungry, and not so easy to set up—virtual machines require full installation. They require more computing resources to execute. To successfully run an application on a virtual machine, the hypervisor needs to first import the virtual machine and then power it up, and this takes time. Furthermore, their performance gets substantially degraded. As a result, only a few virtual machines can be provisioned and made available to work on a single machine.
The Docker software runs in an isolated environment called a Docker container. A Docker container is not a virtual machine in the popular sense. It represents operating system virtualization. While each virtual machine image runs on an independent guest OS, the Docker images run within the same operating system kernel. A container has its own filesystem and environment variables. It's self-sufficient. Because of the containers run within the same kernel, they utilize fewer system resources. The base container can be, and usually is, very lightweight. It's worth knowing that Docker containers are isolated not only from the underlying operating system, but from each other as well. There is no overhead related to a classic virtualization hypervisor and a guest operating system. This allows achieving almost bare metal, near native performance. The boot time of a dockerized application is usually very fast due to the low overhead of containers. It is also possible to speed up the roll-out of hundreds of application containers in seconds and to reduce the time taken provisioning your software.
the traditional virtualization engines. Be aware that containers cannot substitute virtual machines for all use cases. A thoughtful evaluation is still required to determine what is best for your application. Both solutions have their advantages. On one hand we have the fully isolated, secure virtual machine with average performance and on the other hand, we have the containers that are missing some of the key features (such as total isolation), but are equipped with high performance that can be provisioned swiftly. Let's see what other benefits you will get when using Docker containerization.
As you can see, Docker is quite different from the traditional virtualization engines. Be aware that containers are not substitutes for virtual machines for all use cases. A thoughtful evaluation is still required to determine what is best for your application. Both solutions have their advantages. On one hand we have the fully isolated, secure virtual machine with average performance, and on the other hand, we have containers that are missing some of the key features (such as total isolation), but are equipped with high performance and can be provisioned swiftly.
Let's see what other benefits you will get when using Docker containerization.
When comparing the Docker containers with traditional virtual machines, we have mentioned some of its advantages. Let's summarize them now in more detail and add some more.
As we have said before, the first visible benefit of using Docker will be very satisfactory performance and short provisioning time. You can create or destroy containers quickly and easily. Docker shares only the Kernel, nothing less, nothing more. However, it reuses the image layers on which the specific image is built upon. Because of that, multiple versions of an application running in containers will be very lightweight. The result is faster deployment, easier migration, and nimble boot times.
Using Docker enables you to deploy ready-to-run software, which is portable and extremely easy to distribute (we will cover the process of creating an image in Chapter 6, Building Images). Your containerized application simply runs within its container: there's no need for traditional installation. The key advantage of a Docker image is that it is bundled with all the dependencies the containerized application needs to run. The lack of installation of dependencies has a huge advantage. This eliminates problems such as software and library conflicts or even driver compatibility issues. Because of Docker's reproducible build environment, it's particularly well suited for testing, especially in your continuous integration flow. You can quickly boot up identical environments to run the tests. And because the container images are identical each time, you can distribute the workload and run tests in parallel without a problem. Developers can run the same image on their machine that will be run in production later, which again has a huge advantage in testing. The use of Docker containers speeds up continuous integration. There are no more endless build-test-deploy cycles. Docker containers ensure that applications run identically in development, test, and production environments.
One of Docker's greatest features is the portability. Docker containers are portable - they can be run from anywhere: your local machine, a nearby or distant server, and private or public cloud. When speaking about the cloud, all major cloud computing providers, such as Amazon Web Services and Google's Compute Platform have perceived Docker's availability and now support it. Docker containers can be run inside an Amazon EC2 instance or a Google Compute Engine instance, provided that the host operating system supports Docker. A container running on an Amazon EC2 instance can easily be transferred to some other environment, achieving the same consistency and functionality. Docker works very well with various other IaaS (Infrastructure-as-a-Service) providers such as Microsoft's Azure, IBM SoftLayer, or OpenStack. This additional level of abstraction from your infrastructure layer is an indispensable feature. You can just develop your software without worrying about the platform it will be run later on. It's truly a write once run everywhere solution.
Maintaining a truly idempotent configuration management code base can be tricky and a time-consuming process. The code grows over time and becomes more and more troublesome. That's why the idea of an immutable infrastructure is becoming more and more popular nowadays. Containerization comes to the rescue. By using containers during the process of development and deployment of your applications, you can simplify the process. Having a lightweight Docker server that needs almost no configuration management, you manage your applications simply by deploying and redeploying containers to the server. And again, because the containers are very lightweight, it takes only seconds of your time.
As a starting point, you can download a prebuilt Docker image from the Docker Hub, which is like a repository of ready-to-use images. There are many choices of web servers, runtime platforms, databases, messaging servers, and so on. It's like a real gold mine of software you can use for free to get a base foundation for your own project. We will cover the Docker Hub and looking for images in Chapter 5, Finding Images.
The effect of the immutability of Docker's images is the result of the way they are created. Docker makes use of a special file called a Dockerfile. This file contains all the setup instructions on how to create an image, such as must-have components, libraries, exposed shared directories, network configuration, and so on. An image can be very basic, containing nothing but the operating system foundations, or—something that is more common—containing a whole prebuilt technology stack that is ready to launch. You can create images by hand, but it can be an automated process as well.
Docker creates images in a layered fashion: every feature you include will be added as another layer in the base image. This is another serious speed boost compared to the traditional virtualization techniques.
We will get into the details of creating images later, in Chapter 6, Creating Images.
Of course, Docker is not just a Dockerfile processor and runtime engine. It's a complete package with a wide selection of tools and APIs that are helpful during the developer's and DevOp's daily work. First of all, there's The Docker Toolbox, which is an installer to quickly and easily install and setup a Docker environment on your own machine. The Kinematic is desktop developer environment for using Docker on Windows and Mac OS X. Docker distribution also contains a whole bunch of command-line tools that we will be using through out the book. Let's look at them now.
On Windows, depending on the Windows version you use, there are two choices. It can be either Docker for Windows if you are on Windows 10 or later, or Docker Toolbox for all earlier versions of Windows. The same applies to MacOS. The newest offering is Docker for Mac, which runs as a native Mac application and uses xhyve to virtualize the Docker Engine environment and Linux Kernel. For earlier version of Mac that doesn't meet the Docker for Mac requirements (we are going to list them in Chapter 2, Installing Docker) you should pick the Docker Toolbox for Mac. The idea behind Docker Toolbox and Docker native applications are the same—to virtualize the Linux kernel and Docker Engine on top of your operating system. For the purpose of this book, we will be using Docker Toolbox, as it is universal; it will run in all Windows and MacOS versions. The installation package for Windows and Mac OS is wrapped in an executable called the Docker Toolbox. The package contains all the tools you need to begin working with Docker. Of course there are tons of additional third-party utilities compatible with Docker, and some of them very useful. We will present some of them briefly in Chapter 9, Using Docker in Development. But for now, let's focus on the default toolset. Before we start the installation, let's look at the tools that the installer package contains to better understand what changes will be made to your system.
Docker is a client-server application. It consists of the daemon that does the important job: builds and downloads images, starts and stops containers and so on. It exposes a REST API that specifies interfaces for interacting with the daemon and is being used for remote management. Docker Engine accepts Docker commands from the command line, such as
docker to run the image,
docker ps to list running containers,
docker images to list images, and so on.
The Docker client is a command-line program that is used to manage Docker hosts running Linux containers. It communicates with the Docker server using the REST API wrapper. You will interact with Docker by using the client to send commands to the server.
Docker Engine works only on Linux. If you want run Docker on Windows or Mac OS, or want to provision multiple Docker hosts on a network or in the cloud, you will need Docker Machine.
Docker Machine is a fairly new command-line tool created by the Docker team to manage the Docker servers you can deploy containers to. It deprecated the old way of installing Docker with the Boot2Docker utility. Docker Machine eliminates the need to create virtual machines manually and install Docker before starting Docker containers on them. It handles the provisioning and installation process for you behind the scenes. In other words, it's a quick way to get a new virtual machine provisioned and ready to run Docker containers. This is an indispensable tool when developing PaaS (Platform as a Service) architecture. Docker Machine not only creates a new VM with the Docker Engine installed in it, but sets up the certificate files for authentication and then configures the Docker client to talk to it. For flexibility purposes, the Docker Machine introduces the concept of drivers. Using drivers, Docker is able to communicate with various virtualization software and cloud providers. In fact, when you install Docker for Windows or Mac OS, the default VirtualBox driver will be used. The following command will be executed behind the scenes:
docker-machine create --driver=virtualbox default
Another available driver is amazonec2 for Amazon Web Services. It can be used to install Docker on the Amazon's cloud—we will do it later in this chapter. There are a lot of drivers ready to be used, and more are coming all the time. The list of existing official drivers with their documentation is always available at the Docker Drivers website: https://docs.docker.com/machine/drivers.
The list contains the following drivers at the moment:
Amazon Web Services
Google Compute Engine
VMware vCloud Air
Apart from these, there are also a lot of third-party driver plugins available freely on Internet sites such as GitHub. You can find additional drivers for different cloud providers and virtualization platforms, such as OVH Cloud or Parallels for Mac OS, for example, you are not limited to Amazon's AWS or Oracle's VirtualBox. As you can see, the choice is very broad.
When installing the Docker Toolbox on Windows or Mac OS, Docker Machine will be selected by default. It's mandatory and currently the only way to run Docker on these operating systems. Installing the Docker Machine is not obligatory for Linux—there is no need to virtualize the Linux kernel there. However, if you want to deal with the cloud providers or just want to have common runtime environment portable between Mac OS, Windows, and Linux, you can install Docker Machine for Linux as well. We will describe the process later in this chapter. Docker Machine will be also used behind the scenes when using the graphical tool Kitematic, which we will present in a while.
After the installation process, Docker Machine will be available as a command-line tool: docker-machine.
Kitematic is the software tool you can use to run containers through a plain, yet robust graphical user interface (GUI). In 2015, Docker acquired the Kitematic team, expecting to attract many more developers and hoping to open up the containerization solution to more developers and a wider, more general public.
Kitematic is now included by default when installing Docker Toolbox on Mac OS and MS Windows. You can use it to comfortably search and fetch the images you need from Docker Hub. The tool can also be used to run your own app containers. Using the GUI, you can edit environment variables, map ports, configure volumes, study logs, and have command-line access to the containers. It is worth mentioning that you can seamlessly switch between the Kitematic GUI and command-line interface to run and manage application containers. Kitematic is very convenient, however, if you want to have more control when dealing with the containers or just want to use scripting - the command line will be a better solution. In fact, Kitematic allows you to switch back and forth between the Docker CLI and the GUI. Any changes you make on the command-line interface will be directly reflected in Kitematic. The tool is simple to use, as you will see at the end of this chapter, when we are going to test our setup on Mac or Windows PC. For the rest of the book, we will be using the command-line interface for working with Docker.
Compose is a tool, executed from the command line as
docker-compose. It replaces the old fig utility. It's used to define and run multicontainer Docker applications. Although it's very easy to imagine a multi-container application (such as a web server in one container and a database in the other), it's not mandatory. So if you decide that your application will fit in a single Docker container, there will be no use for
docker-compose. In real life, it's very likely that your application will span into multiple containers. With
docker-compose, you use a compose file to configure your application's services, so they can be run together in an isolated environment. Then, using a single command, you create and start all the services from your configuration. When it comes to multicontainer applications,
docker-compose is great for development and testing, as well as continuous integration workflows.
We will use
docker-compose to create
multicontainer applications in Chapter 6, Creating Images, later in this book.
Oracle VM VirtualBox is a free and open source hypervisor for x86 computers from Oracle. It will be installed by default when installing the Docker Toolbox. It supports the creation and management of virtual machines running Windows, Linux, BSD, OS/2, Solaris, and so on. In our case, the Docker Machine using VirtualBox driver, will use VirtualBox to create and boot a bitsy Linux distribution capable of the running
docker-engine. It's worth mentioning that you can also run the teensy-weensy virtualized Linux straight from the VirtualBox itself.
Every Docker Machine you have created using the
Kitematic,will be visible and available to boot in the VirtualBox, when you run it directly, as shown in the following screenshot:
You can start, stop, reset, change settings, and read logs in the same way as for other virtualized operating systems.
Git is a distributed version control system that is widely used for software development and other version control tasks. It has emphasis on speed, data integrity, and support for distributed, non-linear workflows. Docker Machine and Docker client follows the pull/push model of Git for fetching the needed dependencies from the network. For example, if you decide to run the Docker image that is not present on your local machine, Docker will fetch this image from Docker Hub. Docker doesn't internally use Git for any kind of resource versioning. It does, however, rely on hashing to uniquely identify the filesystem layers, which is very similar to what Git does. Docker also takes initial inspiration in the notion of commits, pushes, and pulls. Git is also included in the Docker Toolbox installation package.
From a developer's perspective, there are tools especially useful in a programmer's daily job, be it IntelliJ IDEA Docker Integration Plugin for Java fans or Visual Studio 2015 Tools for Docker for those who prefer C#. They let you download and build Docker images, create and start containers, and carry out other related tasks straight from your favorite IDE. We will cover them in more detail in the next chapters.
Apart from the tools included in the Docker's distribution package (it will be Docker Toolbox for older versions of Windows or Docker for Windows and Docker for Mac), there are hundreds of third-party tools, such as Kubernetes and Helios (for Docker orchestration), Prometheus (for monitoring of statistics), or Swarm and Shipyard for managing clusters. As Docker captures higher attention, more and more Dockerrelated tools pop up almost every week. We will try to briefly cover the most interesting ones in Chapter 9, Using Docker in Development, and more resources.
But these are not the only tools available for you. Additionally, Docker provides a set of APIs that can be very handy. One of them is the Remote API for the management of the images and containers. Using this API, you will be able to distribute your images to the runtime Docker engine. The container can be shifted to a different machine that runs Docker, and executed there without compatibility concerns. This may be especially useful when creating PaaS (Platform-as-a-Service) architectures. There's also the Stats API that will expose live resource usage information (such as CPU, memory, network I/O, and block I/O) for your containers. This API endpoint can be used to create tools that show how your containers behave, for example, on a production system.
By now, we understand the difference between the virtualization and containerization and also, I hope, we can see the advantages of using the latter. We also know what components are available for us to install and use. Let's begin our journey to the world of containers and go straight to the action by installing the software.