ThisÂ first chapter of this book will introduce you to the world of containers and their orchestration. The book starts from the beginning, assuming no prior knowledge in the area of containers,Â and will give you a very practical introduction into the topic.
In this chapter, we are focusing on the software supply chain and the friction within it. We then present containers as a means to reduce this friction and add enterprise-grade security on top of it. In this chapter, we also look into how containers and the ecosystem around them are assembled. We specifically point out the distinction between the upstream Operations Support System (OSS) components, united under the code nameÂ Moby, that form the building blocks of the downstream products of Docker and other vendors.
The chapter covers the following topics:
- What are containers?
- Why are containers important?
- What's the benefit for me or for my company?
- The Moby project
- Docker products
- The container ecosystem
- Container architectureÂ
After completing this module, you will be able to:
- Explain in a few simple sentences to an interested layman what containers are, using an analogy such as physical containers
- Justify to an interested layman why containers are so important, using an analogy such as physical containers versus traditional shipping, or apartment homes versus single family homes, and so on
- Name at least four upstream open source components that are used by the Docker products, such as Docker for Mac/Windows
- Identify at least three Docker products
A software container is a pretty abstract thing and thus it might help if we start with an analogy that should be pretty familiar to most of the readers. The analogy is a shipping container in the transportation industry. Throughout history, people have been transporting goods from one location to another by various means. Before the invention of the wheel, goods would most probably have been transported in bags, baskets, or chests on the shoulders of the humans themselves, or they might have used animals such as donkeys, camels, or elephants to transport them.
With the invention of the wheel, transportation became a bit more efficient as humans would built roads on which they could move their carts along. Many more goods could be transported at a time. When we then introduced the first steam-driven machines, and later gasoline driven engines, transportation became even more powerful. We now transport huge amounts of goods in trains, ships, and trucks. At the same time, the type of goods became more and more diverse, and sometimes complex to handle.
In all these thousands of years, one thing did not change though, and that was the necessity to unload the goods at the target location and maybe load them onto another means of transportation. Take, for example, a farmer bringing a cart full of apples to a central train station where the apples are then loaded onto a train, together with all the apples from many other farmers. Or think of a winemaker bringing his barrels of wineÂ with a truck to the port where they are unloaded, and then transferred to a ship that will transport the barrels overseas.
This unloading from one means of transportation and loading onto another means of transportation was a really complex and tedious process. Every type of good was packaged in its own way and thus had to be handled in its own way.Â Also, loose goods risked being stolen by unethical workers, or goods could be damaged in the process.
Then, there came the container, and it totally revolutionized the transportation industry. The container is just a metallic box with standardized dimensions. The length, width, and height of each container is the same. This is a very important point. Without the world agreeing on a standard size, the whole container thing would not have been as successful as it is now.
Now, with standardized containers, companies who want to have their goods transported from A to B package those goods into these containers. Then, they call a shipper which comes with a standardized means for transportation. This can be a truck that can load a container or a train whose wagons can each transport one or several containers. Finally, we have ships that are specialized in transporting huge amounts of containers. The shippers never need to unpack and repackage goods. For a shipper, a container is a black box and they are not interested in what is in it nor should they care in most cases. It is just a big iron box with standard dimensions. The packaging of goods into containers is now fully delegated to the parties that want to have their goods shipped, and they should know best on how to handle and package those goods.
Since all containers have the same standardized shape and dimensions, the shippers can use standardized tools to handle containers, that is, cranes that unload containers, say from a train or a truck, and load them onto a ship or vice versa. One type of crane is enough to handle all the containers that come along over time. Also, the means of transportation can be standardized, such as container ships, trucks, and trains.
Because of all this standardization, all the processes in and around shipping goods could also be standardized and thus made much more efficient than they were before the age of containers.
I think by now you should have a good understanding of why shipping containers are so important and why they revolutionized the whole transportation industry.Â I chose this analogy purposefully, since the software containers that we are going to introduce here fulfill the exact same role in the so-called software supply chain as shipping containers do in the supply chain of physical goods.
In the old days, developers would develop a new application. Once that application was completed in the eyes of the developers, they would hand this application over to the operations engineers that were then supposed to install it on the production servers and get it running. If the operations engineers were lucky, they even got a somewhat accurate document with installation instructions from the developers. So far so good, and life was easy.
But things got a bit out of hand when in an enterprise, there were many teams of developers that created quite different types of applications, yet all needed to be installed on the same production servers and kept running there. Usually, each application has some external dependencies such as which framework it was built on or what libraries it uses and so on. Sometimes, two applications would use the same framework but in different versions that might or might not be compatible between each other. Our operations engineer's life became much harder over time. They had to be really creative on how they could load their ship, which is of course theirÂ servers with different applications without breaking something.
Installing a new version of a certain application was now a complex project on its own and often needed months of planning and testing. In other words, there was a lot of friction in the software supply chain. But these days, companies rely more and more on software and the release cycles become shorter and shorter. We cannot afford anymore to just have a new release maybe twice a year. Applications need to be updated in a matter of weeks or days, or sometimes even multiple times per day. Companies that do not comply risk going out of business due to the lack of agility. So, what's the solution?
A first approach was to use virtual machines (VMs). Instead of running multiple applications all on the same server, companies would package and run a single application per VM. With it, the compatibility problems were gone and life seemed good again. Unfortunately, the happiness didn't last for long. VMs are pretty heavy beasts on their own since they all contain a full-blown OS such as Linux or Windows Server and all that for just a single application. This is as if in the transportation industry you would use a gigantic ship just to transport a truck load of bananas. What a waste. That can never be profitable.
The ultimate solution to the problem was to provide something much more lightweight than VMs but also able to perfectly encapsulate the goods it needed to transport. Here, the goods are the actual application written by our developers plus (and this is important) all the external dependencies of the application, such as framework, libraries, configurations, and more. This holy grail of a software packaging mechanism was the Docker container.
Developers use Docker containers to package their applications, frameworks, and libraries into them, and then they ship those containers to the testers or to the operations engineers. For the testers and operations engineers, the container is just a black box. It is a standardized black box, though. All containers, no matter what application runs inside them, can be treated equally. The engineers know that if any container runs on their servers, then any other containers should run too. And this is actually true, apart from some edgeÂ cases which always exist.
Thus, Docker containers are a means to package applications and their dependencies in a standardized way. Docker then coined the phraseâBuild, ship and run anywhere.
These days, the time between new releases of an application become shorter and shorter, yet the software itself doesn't become any simpler. On the contrary, software projects increase in complexity. Thus, we need a way to tame the beast and simplify the software supply chain.
We also hear every day how much more cyber crimes are on the rise. Many well-known companies are affected by security breaches. Highly sensitive customer data gets stolen, such as social security numbers, credit card information, and more. But not only customer data is compromised, sensitive company secrets are also stolen.
Containers can help in many ways. First of all, Gartner has found in a recent report that applications running in a container are more secure than their counterparts not running in a container. Containers use Linux security primitives such as Linux kernel namespaces to sandbox different applications running on the same computers and control groups (cgroups), to avoid the noisy neighbor problem where one bad application is using all available resources of a server and starving all other applications.
Due to the fact that container images are immutable, it is easy to have them scanned for known vulnerabilities and exposures, and in doing so, increase the overall security of our applications.
Another way we can make our software supply chain more secure when using containers is to use content trust. Content trust basically ensures that the author of a container image is who they pretend to be and that the consumer of the container image has a guarantee that the image has not been tampered with in transit. The latter is known as aÂ man-in-the-middle (MITM) attack.
All that I have just said is of course technically also possible without using containers, but since containers introduce a globally accepted standard, it makes it so much easier to implement those best practices and enforce them.
OK, but security is not the only reason why containers are important. There are other reasons:
One of them is the fact that containers make it easy to simulate a production-like environment, even on a developer's laptop. If we can containerize any application, then we can also containerize, say, a database such as Oracle or MS SQL Server. Now, everyone who has ever had to install an Oracle database on a computer knows that this is not the easiest thing to do and it takes a lot of space away on your computer. You wouldn't want to do that to your development laptop just to test whether the application you developed really works end to end. With containers at hand, I can run a full-blown relational database in a container as easily as saying 1, 2, 3. And when I'm done with testing, I can just stop and delete the container and the database is gone without leaving a trace on my computer.
Since containers are very lean compared to VMs, it is not uncommon to have many containers running at the same time on a developer's laptop without overwhelming the laptop.
A third reason why containers are important is that operators can finallyÂ concentrate on what they are really good at, provisioning infrastructure, and running and monitoring applications in production. When the applications they have to run on a production system are all containerized, then operators can start to standardize their infrastructure. Every server becomes just another Docker host. No special libraries of frameworks need to be installed on those servers, just an OS and a container runtime such as Docker.
Also, the operators do not have to have any intimate knowledge about the internals of the applications anymore since those applications run self-contained in containers that ought to look like black boxes to the operations engineers, similar to how the shipping containers look to the personnel in the transportation industry.
Somebody once said that today,Â every company of a certain size has to acknowledge that they need to be a software company. Software runs all businesses, period. As every company becomes a software company, there is a need to establish a software supply chain. For the company to remain competitive, their software supply chain has to be secure and efficient. Efficiency can be achieved through thorough automation and standardization. But in all three areas, security, automation, and standardization, containers have shown to shine. Large and well-known enterprises have reported that when containerizing existing legacy applications (many call them traditional applications) and establishing a fully automated software supply chain based on containers, they can reduce the cost used for maintenance of those mission-critical applications by a factor of 50 to 60% and they can reduce the time between new releases of these traditional applications by up to 90%.
That said, the adoption of container technology saves these companies a lot of money, and at the same time it speeds up the development process and reduces the time to market.
Originally, when the company Docker introduced Docker containers, everything was open source. Docker didn't have any commercial products at this time. The Docker engine which the company developed was a monolithic piece of software. It contained many logical parts, such as the container runtime, a network library, a RESTful API, a command-line interface, and much more.
Other vendors or projects such as Red Hat or Kubernetes were using the Docker engine in their own products, but most of the time they were only using part of its functionality. For example, Kubernetes did not use the Docker network library of the Docker engine but provided its own way of networking. Red Hat in turn did not update the Docker engine frequently and preferred to apply unofficial patches to older versions of the Docker engine, yet they still called it the Docker engine.
Out of all these reasons and many more, the idea emerged that Docker had to do something to clearly separate the Docker open source part from the Docker commercial part. Furthermore, the company wanted to prevent competitors from using and abusing the name Docker for their own gains. This was the main reason why the Moby project was born. It serves as the umbrella for most of the open source components Docker developed and continues to develop. These open source projects do not carry the name Docker in them anymore.
Part of the Moby project are components for image management, secret management, configuration management, and networking and provisioning, to name just a few. Also, part of the Moby project are special Moby tools that are, for example, used to assemble components into runnable artifacts.
Some of the components that technically would belong to the Moby project have been donated by Docker to the Cloud Native Computing Foundation (CNCF) and thus do not appear in the list of components anymore. The most prominent ones are
runc which together form the container runtime.Â Â
Docker currently separates its product lines into two segments. There is the Community Edition (CE) which is closed source yet completely free, and then there is the Enterprise Edition (EE) which is also a closed source and needs to be licensed on a yearly basis. The enterprise products are backed by 24 x 7 support and are supported with bug fixes much longer than their CE counterparts.
Docker for Mac and Docker for Windows are easy-to-install desktop applications that can be used to build, debug, and test Dockerized applications or services on a Mac or on Windows. Docker for Mac and Docker for Windows are complete development environments which deeply integrated with their respective hypervisor framework, networking, and filesystem. These tools are the fastest and most reliable way to run Docker on a Mac or on Windows.
Under the umbrella of the CE, there are also two products that are more geared towards operations engineers. Those products are Docker for Azure and Docker for AWS.
For example,Â with Docker for Azure, which is a native Azure application,Â you can set up Docker in a few clicks, optimized for and integrated to the underlying AzureÂ Infrastructure as aÂ Service (IaaS) services. It helps operations engineers to accelerate time to productivity in building and running Docker applications in Azure.
Docker for AWS works very similar but for Amazon's cloud.
The Docker EE consists of the two productsÂ Universal Control Plane (UCP) and Docker Trusted Registry (DTR) that both run on top of Docker Swarm. Both are Swarm applications. Docker EE builds on top of the upstream components of the Moby project and adds enterprise-grade features such as role-based access control (RBAC), multi tenancy, mixed clusters of Docker Swarm and Kubernetes, web-based UI, and content trust, as well as image scanning on top of it.
There has never been a new technology introduced in IT that penetrated the landscape so quickly and so thoroughly than containers. Any company that doesn't want to be left behind cannot ignore containers. This huge interest in containers from all sectors of the industry has triggered a lot of innovation in this sector. Numerous companies have specialized in containers and either provide products that build on top of this technology or build tools that support it.
Initially, Docker didn't have a solution for container orchestration thus other companies or projects, open source or not, tried to close this gap. The most prominent one is Kubernetes which was initiated by Google and then later donated to the CNCF. Other container orchestration products are Apache Mesos, Rancher, Red Hat's Open Shift, Docker's own Swarm, and more.Â
More recently, the trend goes towards a service mesh. This is the new buzz word. As we containerize more and more applications, and as we refactor those applications into more microservice-oriented applications, we run into problems that simple orchestration software cannot solve anymore in a reliable and scalable way. Topics in this area are service discovery, monitoring, tracing, and log aggregation. Many new projects have emerged in this area, the most popular one at this time being Istio, which is also part of the CNCF.
Many say that the next step in the evolution of software are functions, or more precisely,Â Functions as a Service (FaaS). Some projects exist that provide exactly this kind of service and are built on top of containers. One prominent example is OpenFaaS.
We have only scratched the surface of the container ecosystem. All big IT companies such as Google, Microsoft, Intel, Red Hat, IBM, and more are working feverishly on containers and related technologies. The CNCF that is mainly about containers and related technologies, has so many registered projects, that they do not all fit on a poster anymore. It's an exciting time to work in this area. And in my humble opinion, this is only the beginning.Â
Now, let's discuss on a high level how a system that can run Docker containers is designed. The following diagram illustrates what a computer on which Docker has been installed looks like. By the way, a computer which has Docker installed is often called a Docker host, because it can run or host Docker containers:
High-level architecture diagram of the Docker engine
In the preceding diagram, we see three essential parts:
- On the bottom, we have the Linux operating system
- In the middle dark gray, we have the container runtime
- On the top, we have the Docker engine
Containers are only possible due to the fact that the Linux OS provides some primitives, such as namespaces, control groups, layer capabilities, and more which are leveraged in a very specific way by the container runtime and the Docker engine. Linux kernel namespaces such as process ID (pid) namespacesÂ or network(net)Â namespaces allow Docker to encapsulate or sandbox processes that run inside the container. Control groups make sure that containers cannot suffer from the noisy neighbor syndrome, where a single application running in a container can consume most or all of the available resources of the whole Docker host. Control groups allow Docker to limit the resources, such as CPU time or the amount of RAM that each container gets maximally allocated.
The container runtime on a Docker host consists of
runc is the low-level functionality of the container runtime and
containerd, which is based on
runc, provides the higher-level functionality. Both are open source and have been donated by Docker to the CNCF.
The container runtime is responsible for the whole life cycle of a container. It pulls a container image (which is the template for a container) from a registry if necessary, creates a container from that image, initializes and runs the container, and eventually stops and removes the container from the system when asked.Â
The Docker engine provides additional functionality on top of the container runtime, such as network libraries or support for plugins. It also provides a REST interface over which all container operations can be automated. The Docker command-line interface that we will use frequently in this book is one of the consumers of this REST interface.
In this chapter, we looked at how containers can massively reduce the friction in the software supply chain and on top of that, make the supply chain much more secure.
In the upcoming chapter, we will familiarize ourselves with containers. We will learn how to run, stop, and remove containers and otherwise manipulate them. We will also have a pretty good overview over the anatomy of containers. For the first time, we're really going to get our hands dirty and play with these containers, so stay tuned.
Please solve the following questions to assess your learning progress:
- Which statements are correct (multiple answers are possible)?
- A container is kind of a lightweight VM
- A container only runs on a Linux host
- A container can only run one process
- The main process in a container always has PID 1
- A container is one or more processes encapsulated by Linux namespaces and restricted by cgroups
- Explain to an interested layman in your own words, maybe using analogies, what a container is.
- Why are containers considered to be a game changer in IT? Name three to four reasons.
- What does it mean when we claim: If a container runs on a given platform then it runs anywhere...? Name two to three reasons why this is true.
- True or False: Docker containers are only really useful for modern greenfield applications based on microservices. Please justify your answer.
- How much does a typical enterprise save when containerizing their legacy applications?
- Which two core concepts of Linux are containers based on?
Here is a list of links that lead to more detailed information regarding topics we have discussed in this chapter:
- Docker overview atÂ https://docs.docker.com/engine/docker-overview/
- The Moby project atÂ https://mobyproject.org/
- Docker products atÂ https://www.docker.com/get-docker
- Cloud Native Computing Foundation atÂ https://www.cncf.io/
- containerd â industry standard container runtime atÂ https://containerd.io/