Docker for Developers

Chapter 1: Introduction to Docker

Docker is a technology that allows entire applications and their environments to be encapsulated within individual containers. When multiple versions of these containers are run on a single machine, they are sandboxed from one another as if running on their own dedicated machines.

Docker is open source, which fits well with running Linux in containers, as well as numerous available open source components that help build complex systems. It is the logical progression of technologies used for hosting and backend development over the past decade or longer. This progression has moved from a physical kind of hosting to a logical one and has been driven by several requirements. These requirements include reliability, reachability, scalability, and security.

This book is divided into three sections. The first is an introduction to Docker, focusing on local development. The second describes the methodology for testing, deploying, and scaling applications. The third goes into detail about security when using a container-based design.

In this chapter, we will review the history of hosting and backend solutions with a focus on how Docker came to be a widely used technology.

The following topics will be covered in this chapter:

Origins of hosting services
Types of hosting services – co-location
Types of hosting services – self-hosting
The benefits of data centers
How virtualization works
The power requirements at data centers
How virtualization is a solution for data centers and the invention of the cloud
How containers are a bigger win for data centers and hosting

The drivers for Docker

The range of hosting services was originally limited to self-hosted servers, co-located server hosting, and shared hosting. In 1994 and 1995, Best Internet Communications rose from nothing to hosting 18,000+ websites on a pair of Pentium servers, which were the most powerful servers of the time. Best also offered dedicated server-hosting through co-location, dedicated broadband connectivity, and upscale premium services.

Most of the websites hosted by Best were of the shared-hosting variety. All of these sites shared the same server, the same hard drives, the same filesystem, the same RAM, the same CPUs, the same network connections, and so on.

It was not uncommon for any one of these websites to be slashdotted, or containing a link to the site from a very popular site to the hosted site. This would cause a large spike in traffic to the one out of the approximately 18,000 sites, and a performance hit to the others. As the quality of the sites grew and demanded more resources, their administrators would move to dedicated co-located hosting or self-hosting.

Co-located hosting

With co-located hosting, the customer rents a secure cage within a larger hosting facility (data center):

Figure 1.1 – A typical server rack, commonly seen in colocation

The customer can install and manage the machines of their choice. Some co-location facilities offer, for additional fees, remote hands service, where the customer can call the hosting company and one of their engineers does whatever the customer requires to the hosted servers. The cages are locked so that other customers can't gain access to other customers' equipment.

Self-hosting

With self-hosting, the customer buys a full-time dedicated broadband-style connection in a physical location of their choosing:

Figure 1.2 – Indian Railway 139 server room (self-hosting)

The customer ends up building their own kind of data center and installs and manages servers and other equipment on-premises.

Data centers

The benefits of a professional data center are numerous, and ultimately, the trend became that just a few companies, relative to all the companies with an internet presence, provided data centers, and the remaining companies paid rent for dedicated, shared, or premium hosting. A professional data center provides rich internet connectivity (more than one provider, faster connections), clean power, battery-backed-up power for 24/7/365 uptime, back-up generator-backed-up power for longer brownouts or blackouts, fire-suppression systems, a controlled climate suitable for keeping equipment at the proper operating temperatures, multiple physical locations, a professionally managed Network Operations Center (NOC) and technical support, and security in the form of guards, cameras, and fingerprint, handprint, and/or retina scanners:

Figure 1.3 – A server room at CERN (Switzerland)

The companies that ended up building and running the majority of data centers are Google (Google Cloud Platform), Microsoft (Azure), Amazon (Amazon Web Services (AWS)), Yahoo! (once upon a time), and lesser players, which include boutique hosting companies, regional hosting companies, and companies that require security beyond what a hosting company can provide (for example, banks and financial institutions, governments, and so on).

Amazon had a unique need for data centers. They are one of the largest online retailers in the world, as well as the largest data center developer/owner. The number of servers, the uptime, the security, and the reach that they require drove them to build data centers throughout the country and then the world.

Google has a unique need for data centers as well. They are the largest search engine and advertising company in the world. In order to be reachable, Google needs servers in as many physical places as possible. In order to be fast, Google needs many servers—at least enough servers for distributed search index processing in each of its geo-locations.

Companies such as RackSpace and Level 3 were originally built as data center providers. Their specialties included co-location facilities, dedicated server hosting, remote hands, NOCs, nationwide-dedicated fiber-optic backbones, clean and blackout resistant power, and very rich connectivity to various other networks, including AT&T, Verizon, and Comcast. They found themselves with the infrastructure to follow the trend toward virtualization and began to offer these cloud services.

The highest cost of providing data center services, and this passed on to the customer, was initially bandwidth. The providers paid for bandwidth by the megabit, plus a monthly cost of maintaining the physical connections that carried this bandwidth. As the providers built their own private infrastructure to carry data between their own data centers around the world, the cost became a flat rate, or a fixed cost, for a significant amount of the total bandwidth used. This allowed the price of bandwidth to decline to the point where it became a minimal consideration for hosting.

These companies ended up building a comprehensive infrastructure for dedicated hosting. It turns out that this infrastructure is ideally suited for virtualized product offerings, too.

Using virtualization to economize resource usage

Virtualization is the process of exposing a portion of a physical machine as a logical or virtual machine that acts enough like a real machine that it supports the installation of whole operating systems, their filesystems, and the software that runs on the operating system. For example, a machine with 64 GB of RAM and 4 CPUs could run virtualization software that masquerades as four 16 GB RAM machines with 1 CPU each. This machine could run four instances of Linux.

Virtualization is not a new concept, having been implemented by IBM in the early 1960s. It likely gained in overall popularity during the 1980s when it was used to run MS-DOS, and then Windows by computer systems such as the original Apple Macintosh (Mac) and Unix computers such as the Sun and Silicon Graphics workstations.

Initial virtualization software used what features were available on CPUs of the time, but often simply emulated the instruction set of the x86 on the 68000 family or custom CPUs of the professional Unix workstations. SoftPC was one of the most popular offerings in the 1980s.

SoftPC was quite slow, but the ability to run Windows or MS-DOS applications on a Mac computer allowed the use of these machines in business and education environments. Instead of adding Microsoft Office compatibility to all the programs on the Mac to support exchanging files between Windows/MS-DOS users and Mac users, users could run Microsoft Office.

People saw it in action and saw the value in it. Windows was the dominant operating system for home and business, and to fit in with Windows in the corporate environment, something like SoftPC was needed. The problem with SoftPC is that it was pure software emulation, which was quite slow in actual use. Virtualization is superior to emulation in terms of performance!

Entire companies were founded around providing consumer or business virtualization solutions. VMWare, founded in 1998, was one of the first of these companies.

Innotek developed VirtualBox and released it as open source in 2007, and was then acquired by Sun Microsystems in 2008. Then, Sun was acquired by Oracle in 2010. Parallels, a virtualization solution for Mac, was developed in 2004 and became mainstream in 2006.

The value of virtualization encouraged chip makers to gradually add CPU support for virtualization. With CPU support, an x86-based system could run virtualized machines or software at close enough to native speed to be much more tolerable. This, in turn, led the workstation companies (such as Apple, Sun, and Silicon Graphics) to move to x86 CPUs.

A key component of virtualization software is the hypervisor. The hypervisor presents the virtual machine to the chosen operating system and then manages the resources and execution of the virtual machines over time. The virtual machines themselves are configurable, at least regarding the amount of RAM, the number of logical CPU cores, graphics card memory, the host operating system disk files to act as virtual disk drives in the virtual machine, the mounting and unmounting of CD-ROM in the virtual CD-ROM drive, and so on. The hypervisor assures that these resources are truly available and that no virtual machine starves the other virtual machines for the host machine's resources.

For the enterprise, the requirements were somewhat different. Instead of providing virtual machines via a general-purpose host operating system such as Linux, the entire operating system itself could be optimized just for being the hypervisor. VMWare offered its Elastic Sky X Integrated (ESXi) operating system in 2004. The University of Cambridge computer laboratory developed the Xen hypervisor in the late 1990s, and the first stable version was released in 2003. Xen was originally the hypervisor used by Amazon for its Elastic Compute Cloud offering, before moving to KVM.

KVM is a virtualization solution supported directly by the Linux kernel. The kernel can act as the hypervisor under KVM. KVM can additionally emulate processors other than the host's native CPU, which is typically x86. This allows KVM to be used to emulate targets such as the Raspberry Pi.

Scaling a dedicated hosted website can be problematic. It's possible to simply upgrade to a larger and more powerful server to handle growing traffic and services. At some point, however, there is no server that is large and powerful! To scale up from that point requires distributing services across multiple servers.

Addressing the increasing power requirements

The trend toward virtualization created a demand for a new breed of servers to be housed at the data centers. Where a customer might have rented or installed their own dedicated server with 16 GB of RAM, the virtual server provider could rent a portion of a 128 GB RAM server and share that server with multiple customers. These bigger servers required more CPU cores, so the virtual servers could have reasonable computing capabilities.

Fitting these specialized servers into the same space as the smaller and less capable dedicated servers created a new challenge: power. Instead of using 400 watts of power for the dedicated server, the cloud servers might use 1,600 watts; the power requirements would be four times more. In addition to the power requirements of the machines themselves, it took more power to run the air conditioners to cool the machines.

The power cost requirements changed the equation for dedicated hosting, so bandwidth pricing was virtually free, while the power requirements of the servers were charged at a very high price.

To help mitigate the cost of power, data centers have been built to provide some of their own power. Solar panels, building near a river that can drive turbines, wind turbines, and building in places with cool or cold climates are among some of the techniques used. Data centers do use batteries for back-up power, and diesel-powered generators as well.

Energy efficiency is another way to mitigate power costs. The use of lower-powered CPUs and other computer parts is one means to this end. The CPU manufacturers have had a heavy focus on producing lower-powered CPUs for both data center and laptop use.

The hosting companies would supply a 60 watt power supply for each co-location cage. If you needed more than 60 watts, you could pay extra to have additional 60 watt lines for your cage. You'd pay for the construction and then the monthly power usage.

Hosting at one of these facilities was problematic for most customers. It required purchasing physical machines and other hardware, designing the infrastructure required for the services to be provided, physical access to the cage and hardware from time to time, and potential failures, which meant downtime.

The growth and popularity of services require scalability or more and bigger machines. You could repurpose old machines, but they take up space and power. Customer costs soared when the current cage filled up and more presence was required.

The next step, and the solution to these hassles, is virtualization and running your servers and services within the cloud.

Virtualization and cloud computing

Most customers don't need dedicated servers. What they really need is the security of a filesystem that only their software can read and write to, that the CPU is guaranteed to be dedicated to their purpose, and that the throughput and computing power is identifiable and delivered as expected.

The appeal of virtual servers offered by companies such as AWS drove many administrators to move away from dedicated and self-hosting. AWS grows its offerings to add more value to virtual hosting, so their customers get the benefit of Amazon's developers efforts.

It's relatively cheap to duplicate the customer-designed infrastructure to create a testing environment that is separate from the live/deployed applications. It's easy to scale services that grow with popularity, or when the services are slashdotted. This is a term that describes what happens when a very popular site adds a link to another site, driving a lot more traffic to that site—perhaps more traffic than the site was designed to handle.

The design and deployment of a virtualized infrastructure can be done from the comfort of your office. There is no need to physically visit a data center. If you need to scale horizontally, you only need to spin up additional virtual machine instances. If you need to scale vertically, you only need to spin up a more powerful virtual machine and substitute it for the one that is too slow or too small.

If hardware fails at a cloud-hosting facility, the hosting company's employees install new hardware. This is done in complete transparency with you, the customer. A feature known as Teleport allows the hosting company to move a running virtual machine to a different physical machine, without the interruption of service.

Along with virtual servers, hosting companies can also offer virtual disks, elastic IPs, load balancers, DNS, backup solutions, and so on. Virtual disks are handy because you can back them up by simply copying the file that is the image. You can also boot new instances from an existing virtual disk, saving the time required to install a whole operating system on a virtual machine.

The ability to use elastic IPs and virtual load balancers allows a scalability that is as easy as the click of a mouse.

You can assign an elastic IP to any virtual instance or load balancer. If the instance is stopped, you can reassign that IP to another instance. If this were handled only with DNS, there could be days' worth of delays for the DNS to propagate through the many DNS servers at the ISPs. The load balancer allows you to create virtual server farms and balance incoming requests between the virtual servers in the farm. You can trivially spin up and add additional virtual servers to the load balancer as you need to scale. The hosting companies can even provide software triggers that will automatically spin up and add new servers when traffic increases, and then spin them down and remove them when traffic is reduced:

Figure 1.4 – Hardware virtualization

A popular stack technology at the time that AWS was made available to the public was LAMP, which is short for Linux, Apache, MySQL, and PHP. A typical setup would be to install these four software packages on a dedicated Linux server. AWS offered RDS, or a MySQL equivalent dedicated virtual server, which allowed the offloading and scaling of the LAMP application. AWS offered virtual load balancers, which are logical Ethernet switches that load balance traffic among two or more web servers. They offered domain name-hosting and elastic IPs, so a site's uptime could be almost infinite. AWS continues to develop new software and services to benefit its customers.

AWS and its competitors allow a cost-effective and dynamic way to grow an internet presence as it gains popularity. The price structure is common among most providers. The cost is based on the number of elastic load balancers, the number of virtual server instances, the amount of RAM, the number of virtual CPUs, the size of persistent storage, and the bandwidth. There are also optional additional services that can increase the price.

Virtual servers provide the benefits of a physical one, but it comes at the cost of the dedication of physical RAM on the host machine and the power required to run the machine. A host machine might have 64 GB of RAM; it can run some combination of virtual machines that, combined, use up that RAM—for example, four 16 GB virtual machines, two 32 GB virtual machines, two 16 gigabytes and one 32 GB virtual machine, and so on.

A risk of virtual machines is that when the host machine is rebooted or fails, all the virtual machines hosted on it will go off air.

The features that enable virtualization and the limitations of virtualization when applied at data centers make containerization a viable and preferred alternative.

Using containers to further optimize data center resources

Docker is a clever use of OS-level virtualization support that allows multiple Docker containers to execute on a single machine. A container is a running instance of a container image. The containers are, by default, isolated from the host machine, as well as from one another.

They can be configured to expose resources, such as networking ports, to the host network (for example, the internet) or to one another. The following diagram illustrates the basic structure of containers on a host:

Figure 1.5 – Docker containerization

Containers share their Linux kernel with the host, so you do not need to install complete operating systems within the container as you do with virtual machines. The containers are managed by the Docker daemon, which handles the management of the containers and resources they use, as well as the images, networks, volumes, and so on.

An important distinction between virtual servers and containers is that containers share the resources, directly, of the host, whereas virtual servers require duplicate resources. For example, two identical containers use the host's RAM, rather than a block of RAM configured before booting the virtual machine. If you need to constrain the resources (the CPU, memory, swap, and so on) of a container, you can do so, but the default is to have no resource constraints on any container.

Unlike with virtual servers, you deal with an application image, rather than a virtual disk. You can copy the image to back it up, but there is no virtual disk file to copy. These application images are progressively built on top of other containers. When you build a container, only the bits of the application image that change need to be dealt with.

When designing services that use containers, you will not likely install many components within any one container. For a virtual machine running a LAMP application, you might install Apache, MySQL, and PHP all within one virtual machine. When designing the same LAMP application for containers, you might configure one container just for MySQL and another for Apache and PHP. You can then scale your application by running additional Apache and PHP containers and additional MySQL instances in a cluster configuration.

If we consider the use of containers for the LAMP application discussed earlier, we can implement MySQL in a dedicated container, and Apache and PHP in another; all this running on top of the host's Linux kernel. To scale the LAMP application, a second, third, fourth, and so on instance of the Apache/PHP container can be spun up, and the same is true for the MySQL container. MySQL containers can be configured for master-subordinate operations.

If the host operating system is not Linux kernel-based, there are two options. The first option is to run host OS native containers (for example, Windows containers on a Windows host). The second option is to run a Linux virtual machine on the host and run the containers within that virtual machine.

Containerization is a boon for hosting companies and their customers. No longer is it required to dedicate a fixed amount of RAM per container as is required with virtual machines. A physical machine is limited only by its resources when it comes to the number of containers it can run concurrently. The pricing model for containers can save customers on monthly costs. Thus, containerization is a big win.

In the next chapter, we'll look at how to use virtual machines and Docker to develop applications locally. Later in this book, we'll look at how to deploy our locally developed software to publicly accessible internet/cloud infrastructure.

Filter reviews by

All

Packt verified reviews

Amazon verified reviews

brando Dec 10, 2020

I read this book over the course of a week and deployed docker containers into our repositories over a two week period following that. Then, the week after that, I gave a lunch n learn to the company, using what I had l learned from the book and experience from practice. Now we have containers for clean e2e testing, gitlab docker runners to run and scale to more runners without new VMs, they build containers for us, and it’s quickly become normal in our company. I’ll be reviewing the kuberneties sections over the new week to keep down the path. Thanks Richard. This book cuts to the chase and lives up to the title!

Amazon Verified review

jmortegac Feb 20, 2021

Very completed with practical examples, tools and use cases.

Hattie Cobb Oct 04, 2020

When reading articles, tutorials and even books, that is very common that at the end of the reading you struggle about how to translate that to a real production situation. Believe me, this book is different. You get to the end with a sense that you are very likely to know what are the next steps to apply what you learned to your existent or new projects. And this means a lot. The book has some great balance from history, concepts, example and practice.Although there is a lot of cont to grasp and the book is not exhaustive about the subjects, it gives you the most important information and real-world scenarios. You will be confident about the path you need to go to improve your local environment, to prepare a CI/CD pipeline and ship your product to production with a high-quality workflow. The authors are careful enough to list different solutions and approaches because there is not a single tool that solves all the problems and this book give you enough to understand which stack could fit better to your software needs.If you want to know more about Docker on your local environment, how containers relate to production, container orchestration with Docker swarm or Kubernetes, container image publishing, etc, you need this book.I highly recommend it.

robert pitts Apr 09, 2023

I recently had the opportunity to try out the "Docker for Developers" platform and I have to say, I'm thoroughly impressed. As someone who has been in the development industry for a few years, I'm always on the lookout for tools that can make my work more efficient and productive. Docker for Developers does just that.One of the standout features of Docker for Developers is the ease with which it allows you to create and manage containers. I was able to quickly spin up multiple containers, each with its own application stack, and then seamlessly switch between them. This made it much easier to test different configurations and ensure that my code was working as intended.Another great aspect of Docker for Developers is its ability to simplify the deployment process. With just a few simple commands, I was able to package my code into a container and deploy it to the cloud or on-premises environment. This saved me a ton of time and headaches compared to traditional deployment methods.One feature that I particularly appreciated was the ability to integrate with my existing development workflow. Docker for Developers works seamlessly with tools like Jenkins, Git, and Visual Studio Code, allowing me to incorporate it into my existing workflows without disrupting them.Overall, I highly recommend Docker for Developers to any developer looking to streamline their development process. With its intuitive interface, powerful container management, and seamless deployment capabilities, it's a must-have tool for any modern development team.

Kunde aus Bremerhaven Feb 05, 2023

I ch bin ganz begeistert.

Docker for Developers: Develop and run your application with Docker containers using DevOps tools for continuous delivery

What do you get with Print?

Docker for Developers

Chapter 1: Introduction to Docker

The drivers for Docker

Co-located hosting

Self-hosting

Data centers

Using virtualization to economize resource usage

Addressing the increasing power requirements

Using containers to further optimize data center resources

Summary

Further reading

Page 1 of 7

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

Docker for Developers: Develop and run your application with Docker containers using DevOps tools for continuous delivery

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access