In this article by Jeeva Chelladhurai, Pethuru Raj Chelliah, and Vinod Singh, the authors of the book Learning Docker, Second Edition, we will learn how the Docker images are built by using Dockerfile, which is the standard way for bringing forth highly usable Docker images. Leveraging Dockerfile is the most competent way for building powerful images for the software development community.
(For more resources related to this topic, see here.)
Docker's integrated image building system
The Docker images are the fundamental building blocks of containers. These images could be very basic operating environments such, as busybox or Ubuntu. Or the images could craft advanced application stacks for the enterprise and cloud IT environments. We could craft an image manually by launching a container from a base image, install all the required applications, make the necessary configuration file changes, and then commit the container as an image.
As a better alternative, we could resort to the automated approach of crafting the images by using Dockerfile. Dockerfile is a text-based build script that contains special instructions in a sequence for building the right and the relevant images from the base images. The sequential instructions inside the Dockerfile can include the base image selection, installing the required application, adding the configuration and the data files, and automatically running the services as well as exposing those services to the external world. Thus, Docker file-based automated build system has remarkably simplified the image-building process. It also offers a great deal of flexibility in the way in which the build instructions are organized and in the way in which they visualize the complete build process.
The Docker Engine tightly integrates this build process with the help of the docker build subcommand. In the client-server paradigm of Docker, the Docker server (or daemon) is responsible for the complete build process and the Docker command line interface is responsible for transferring the build context, including transferring Dockerfile to the daemon.
In order to have a sneak peak into the Dockerfile integrated build system in this section, we introduce you to a basic Dockerfile. Then, we explain the steps for converting that Dockerfile into an image, and then launching a container from that image. Our Dockerfile is made up of two instructions, as shown here:
$cat Dockerfile
FROM busybox:latest
CMD echo Hello World!!
In the following, we cover/discuss the two instructions mentioned earlier:
The first instruction is for choosing the base image selection. In this example, we select the busybox: latest image
The second instruction is for carrying out the command CMD, that instructs the container to echo Hello World!!
Now, let's proceed towards generating a Docker image by using the preceding Dockerfile by calling docker build along with the path of Dockerfile. In our example, we will invoke the docker build subcommand from the directory where we have stored Dockerfile, and the path will be specified by the following command:
$sudo docker build .
After issuing the preceding command, the build process will begin by sending build context to the daemon and then display the text shown here:
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM busybox:latest
The build process would continue and after completing itself, it will display the following:
Successfully built 0a2abe57c325
In the preceding example, the image was built with the IMAGE ID0a2abe57c325. Let's use this image to launch a container by using the docker run subcommand as follows:
$sudo docker run 0a2abe57c325
Hello World!!
Cool, isn't it? With very little effort, we have been able to craft an image with busybox as the base image, and we have been able to extend that image to produce Hello World!!. This is a simple application, but the enterprise-scale images can also be realized by using the same technology.
Now, let's look at the image details by using the docker images subcommand, as shown here:
$ sudo docker images
REPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZE
<none><none>0a2abe57c3252 hours ago2.433 MB
Here, you may be surprised to see that the IMAGE(REPOSITORY) and TAG name have been listed as <none>.This is because we did not specify any image or any TAG name when we built this image. You could specify an IMAGE name and optionally a TAG name by using the docker tag subcommand, as shown here:
$ sudo docker tag 0a2abe57c325 busyboxplus
The alternative approach is to build the image with an image name during the build time by using the-t option for the docker build subcommand, as shown here:
$sudo docker build -t busyboxplus .
Since there is no change in the instructions in Dockerfile, the Docker Engine will efficiently reuse the old image that has ID0a2abe57c325 and update the image name to busyboxplus. By default, the build system would apply latest as the TAG name. This behavior can be modified by specifying the TAG name after the IMAGE name by having a : separator placed in between them. That is, <image name>:<tag name> is the correct syntax for modifying behaviors, wherein <image name> is the name of the image and <tag name> is the name of the tag.
Once again, let's look at the image details by using the docker images subcommand, and you will notice that the image (Repository) name is busyboxplus and the tag name is latest:
$ sudo docker images
REPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZE
busyboxplus latest0a2abe57c3252 hours ago2.433 MB
Building images with an image name is always recommended as the best practice.
A quick overview of the Dockerfile's syntax
In this section, we explain the syntax or the format of Dockerfile. A Dockerfile is made up of instructions, comments, parser directives, and empty lines, as shown here:
# Comment
INSTRUCTION arguments
The instruction line of Dockerfile is made up of two components, where the instruction line begins with the instruction itself, which is followed by the arguments for the instruction. The instruction could be written in any case, in other words, it is case-insensitive. However, the standard practice or the convention is to use uppercase in order to differentiate it from the arguments. Let's take a relook at the content of Dockerfile in our previous example:
FROM busybox:latest
CMD echo Hello World!!
Here, FROM is an instruction which has taken busybox:latest as an argument, and CMD is an instruction which has taken echo Hello World!! as an argument.
The commentline
The comment line in Dockerfile must begin with the# symbol. The # symbol after an instruction is considered as an argument. If the # symbol is preceded by a whitespace, then the docker build system would consider that as an unknown instruction and skip the line. Now, let's understand the preceding cases with the help of an example to get a better understanding of the comment line:
A valid Dockerfile comment line always begins with a# symbol as the first character of the line:
# This is my first Dockerfile comment
The # symbol can be a part of an argument
CMD echo ### Welcome to Docker ###
If the # symbol is preceded by a whitespace, then it is considered as an unknown instruction by the build system
# this is an invalid comment line
The docker build system ignores any empty line in the Dockerfile and hence, the author of Dockerfile is encouraged to add comments and empty lines to substantially improve the readability of Dockerfile.
The parser directives
As the name implies, the parser directives instruct the Dockerfile parser to handle the content of the Dockerfile as specified in the directives. The parser directives are optional and they must be at the very top of a Dockerfile. Currently escape is the only supported directive.
We use escape character to escape characters in a line or to extend a single line to multiple lines. On UNIX like platform is the escape character whereas on windows is a directory path separator and ` is the escape character. By default, Dockerfile parser considers as the escape character and you could override this on windows using escape parser directive as shown here:
# escape=`
The Dockerfile build instructions
So far, we have looked at the integrated build system, the Dockerfile syntax and a sample lifecycle, wherein how a sample Dockerfile is leveraged for generating an image and how a container gets spun off from that image was discussed. In this section, we will introduce the Dockerfile instructions, their syntax, and a few befitting examples.
The FROM instruction
The FROM instruction is the most important one and it is the first valid instruction of a Dockerfile. It sets the base image for the build process. The subsequent instructions would use this base image and build on top of it. The Docker build system lets you flexibly use the images built by anyone. You can also extend them by adding more precise and practical features to them. By default, the Docker build system looks in the Docker host for the images. However, if the image is not found in the Docker host, then the Docker build system will pull the image from the publicly available Docker Hub Registry. The Docker build system will return an error if it could not find the specified image in the Docker host and the Docker Hub Registry.
The FROM instruction has the following syntax:
FROM <image>[:<tag>|@<digest>]
In the preceding code statement, note the following:
<image>: This is the name of the image which will be used as the base image
<tag> or<digest>:Both tag and digest are optional attributes and you could qualify a particular Docker image version using either a tag or a digest. Tag latest is assumed by default if both tag and digest are not present.
Here is an example of the FROM instruction with the image name centos:
FROM centos
The MAINTAINER instruction
The MAINTAINER instruction is an informational instruction of a Dockerfile. This instruction capability enables the authors' to set the details in an image. Docker does not place any restrictions on placing the MAINTAINER instruction in Dockerfile. However, it is strongly recommended that you should place it after the FROM instruction.
The following is the syntax of the MAINTAINER instruction, where <author's detail> can be in any text. However, it is strongly recommended that you should use the image, author's name and the e-mail address as shown in this code syntax:
MAINTAINER <author's detail>
Here is an example of the MAINTAINER instruction with the author name and the e-mail address:
MAINTAINER Dr. Peter <peterindia@gmail.com>
The COPY instruction
The COPY instruction enables you to copy the files from the Docker host to the file system of the new image. The following is the syntax of the COPY instruction:
COPY <src> ... <dst>
The preceding code terms bear the explanations shown here:
<src>: This is the source directory, the file in the build context, or the directory from where the docker build subcommand was invoked.
...: This indicates that multiple source files can either be specified directly or be specified by wildcards.
<dst>:This is the destination path for the new image into which the source file or directory will get copied. If multiple files have been specified, then the destination path must be a directory and it must end with a slash /.
Using an absolute path for the destination directory or a file has been recommended. In the absence of an absolute path, the COPY instruction will assume that the destination path will start from root /.The COPY instruction is powerful enough for creating a new directory and for overwriting the file system in the newly created image.
The ADD instruction
The ADD instruction is similar to the COPY instruction. However, in addition to the functionality supported by the COPY instruction, the ADD instruction can handle the TAR files and the remote URLs. We can annotate the ADD instruction as COPY on steroids.
The following is the syntax of the ADD instruction:
ADD <src> ... <dst>
The arguments of the ADD instruction are very similar to those of the COPY instruction, as shown here:
<src>: This is either the source directory or the file that is in the build context or in the directory from where the docker build subcommand will be invoked. However, the noteworthy difference is that the source can either be a tar file stored in the build context or be a remote URL.
...: This indicates that the multiple source files can either be specified directly or be specified by using wildcards.
<dst>: This is the destination path for the new image into which the source file or directory will be copied.
The ENV instruction
The ENV instruction sets an environment variable in the new image. An environment variable is a key-value pair, which can be accessed by any script or application. The Linux applications use the environment variables a lot for a starting configuration.
The following line forms the syntax of the ENV instruction:
ENV <key><value>
Here, the code terms indicate the following:
<key>: This is the environment variable
<value>: This is the value that is to be set for the environment variable
The following lines give two examples for the ENV instruction, where, in the first line, DEBUG_LVL has been set to 3 and in the second line, APACHE_LOG_DIR has been set to /var/log/apache:
ENV DEBUG_LVL 3
ENV APACHE_LOG_DIR /var/log/apache
The ARG instruction
The ARG instruction lets you define variables that can be passed during the Docker image build time. The Docker build subcommand supports --build-arg flag to pass value to the variables defined using ARG instruction. If you specify a build argument that was not defined in your Dockerfile, the build would fail. In other words, the build argument variables must be defined in the Dockerfile to be passed during the Docker image build time.
The syntax of the ARG instruction is as follows:
ARG<variable>[=<default value>]
Wherein, the code terms mean the following:
<variable>: This is the build argument variable
<default value>: This is the default value you could optionally specify to the build argument variable
The environmentvariables
The environment variables declared using ENV or ARG instruction can be used in ADD, COPY, ENV, EXPOSE, LABEL, USER, WORKDIR, VOLUME, STOPSIGNAL and ONBUILD instruction.
Here is an example of environment variable usage:
ARGBUILD_VERSION
LABEL com.example.app.build_version=${ BUILD_VERSION}
The USER instruction
The USER instruction sets the startup user ID or username in the new image. By default, the containers will be launched with root as the user ID or UID. Essentially, the USER instruction will modify the default user ID from root to the one specified in this instruction.
The syntax of the USER instruction is as follows:
USER <UID>|<UName>
The USER instructions accept either <UID> or <UName> as its argument.
<UID>: This is a numerical user ID
<UName>: This is a valid user Name
Following is an example for setting the default user ID at the time of startup to 73. Here 73 is the numerical ID of the user:
USER 73
Though, it is recommended that you have a valid user ID to match with the /etc/passwd file, the user ID can contain any random numerical value. However, the username must match with a valid username in the /etc/passwd file, otherwise the docker run subcommand will fail and it will display the following error message:
finalize namespace setup user get supplementary groups Unable to find user
The WORKDIR instruction
The WORKDIR instruction changes the current working directory from / to the path specified by this instruction. The ensuing instructions, such as RUN, CMD, and ENTRYPOINT will also work on the directory set by the WORKDIR instruction.
The following line gives the appropriate syntax for the WORKDIR instruction:
WORKDIR <dirpath>
Here, <dirpath> is the path for the working directory to set in. The path can be either absolute or relative. In case of a relative path, it will be relative to the previous path set by the WORKDIR instruction. If the specified directory is not found in the target image file system, then the director will be created.
The following line is a clear example of the WORKDIR instruction in a Dockerfile:
WORKDIR /var/log
The VOLUME instruction
The VOLUME instruction creates a directory in the image file system, which can later be used for mounting volumes from the Docker host or the other containers.
The VOLUME instruction has two types of syntax, as shown here:
The first type is either exec or JSON array (all values must be within double-quotes (")).
VOLUME ["<mountpoint>"]
The second type is shell, as shown here:
VOLUME <mountpoint>
In the preceding lines, <mountpoint> is the mount point that has to be created in the new image.
The EXPOSE instruction
The EXPOSE instruction opens up a container network port for communicating between the container and the external world.
The syntax of the EXPOSE instruction is as follows:
EXPOSE <port>[/<proto>] [<port>[/<proto>]...]
Here, the code terms mean the following:
<port>:This is the network port that has to be exposed to the outside world.
<proto>: This is an optional field provided for a specific transport protocol, such as TCP and UDP. If no transport protocol has been specified, then TCP is assumed to be the transport protocol.
The EXPOSE instruction allows you to specify multiple ports in a single line.
The LABEL instruction
The LABEL instruction enables you to add key-value pairs as metadata to your Docker images. These metadata can be further leveraged to provide meaningful Docker image management and orchestration.
The syntax of the LABEL instruction is as follows:
LABEL<key-1>=<val-1><key-2>=<val-2> ... <key-n>=<val-n>
The LABEL instruction can have one or more key-value pair. Though a Dockerfile can have more than one LABEL instruction, it is recommended to use single LABEL instruction with multiple key-value pairs.
Here is an example for the LABEL instruction:
LABEL version=”2.0”
release-date=”2016-08-05”
The preceding label keys are very simple and this could result in naming conflicts. Hence Docker recommends using namespaces to label keys using reverse domain notation. There is a community project called Label Schema that provides shared namespace. The shared namespace acts as a glue between the image creators and tool builders to provide standardized Docker image management and orchestration. Here is an example of LABEL instruction using Label Schema:
LABELorg.label-schema.schema-version=”1.0”
org.label-schema.version=”2.0”
org.label-schema.description=”Learning Docker Example”
The RUN instruction
The RUN instruction is the real workhorse during the build time, and it can run any command. The general recommendation is to execute the multiple commands by using one RUN instruction. This reduces the layers in the resulting Docker image because the Docker system inherently creates a layer for each time an instruction is called in Dockerfile.
The RUN instruction has two types of syntax.
The first is the shell type, as shown here:
RUN <command>
Here, the <command> is the shell command that has to be executed during the build time. If this type of syntax is to be used, then the command is always executed by using /bin/sh -c.
And, the second syntax type is either exec or the JSON array, as shown here:
RUN ["<exec>", "<arg-1>", ..., "<arg-n>"]
Wherein, the code terms mean the following:
<exec>: This is the executable to run during the build time.
<arg-1>, ..., <arg-n>: These are the variables (zero or more) number of the arguments for the executable.
Unlike the first type of syntax, this type does not invoke /bin/sh -c. Hence, the types of shell processing, such as the variable substitution ($USER) and the wild card substitution (*,?), does not happen in this type. If shell processing is critical for you, then you are encouraged to use the shell type. However, if you still prefer the exec (JSON array type) type, then use your preferred shell as the executable and supply the command as an argument.
For example, RUN ["bash", "-c", "rm", "-rf", "/tmp/abc"].
The CMD instruction
The CMD instruction can run any command (or application), which is similar to the RUN instruction. However, the major difference between those two is the time of execution. The command supplied through the RUN instruction is executed during the build time, whereas the command specified through the CMD instruction is executed when the container is launched from the newly created image. Thus, the CMD instruction provides a default execution for this container. However, it can be overridden by the docker run subcommand arguments. When the application terminates, the container will also terminate along with the application and vice versa.
The CMD instruction has three types of syntax, as shown here:
The first syntax type is the shell type, as shown here:
CMD <command>
Wherein, the <command> is the shell command, which has to be executed during the launch of the container. If this type of syntax is used, then the command is always executed by using /bin/sh -c.
The second type of syntax is exec or the JSON array, as shown here:
CMD ["<exec>", "<arg-1>", ..., "<arg-n>"]
Wherein, the code terms mean the following:
<exec>: This is the executable, which is to be run during the container launch time
<arg-1>, ..., <arg-n>: These are the variable (zero or more) number of the arguments for the executable
The third type of syntax is also exec or the JSON array, which is similar to the previous type. However, this type is used for setting the default parameters to the ENTRYPOINT instruction, as shown here:
CMD ["<arg-1>", ..., "<arg-n>"]
Wherein, the code terms mean the following:
<arg-1>, ..., <arg-n>: These are the variables (zero or more) number of the arguments for the ENTRYPOINT instruction
Now, let's build a Docker image by using the docker build subcommand and cmd-demo as the image name. The docker build system will read the instruction from the Dockerfile that is stored in the current directory (.), and craft the image accordingly as shown here:
$sudo docker build -t cmd-demo .
Having built the image, we can launch the container by using the docker run subcommand, as shown here:
$sudo docker run cmd-demo
Dockerfile CMD demo
Cool, isn't it? We have given a default execution for our container and our container has faithfully echoed Dockerfile CMD demo. However, this default execution can be easily overridden by passing another command as an argument to the docker run subcommand, as shown in the following example:
$sudo docker run cmd-demo echo Override CMD demo
Override CMD demo
The ENTRYPOINT instruction
The ENTRYPOINT instruction will help in crafting an image for running an application (entry point) during the complete lifecycle of the container, which would have been spun out of the image. When the entry point application is terminated, the container would also be terminated along with the application and vice versa. Thus, the ENTRYPOINT instruction would make the container function like an executable. Functionally, ENTRYPOINT is akin to the CMD instruction, but the major difference between the two is that the entry point application is launched by using the ENTRYPOINT instruction, which cannot be overridden by using the docker run subcommand arguments. However, these docker run subcommand arguments will be passed as additional arguments to the entry point application. Having said this, Docker provides a mechanism for overriding the entry point application through the--entrypoint option in the docker run subcommand. The --entrypoint option can accept only word as its argument and hence, it has limited functionality.
Syntactically, the ENTRYPOINT instruction is very similar to the RUN, and the CMD instructions, and it has two types of syntax, as shown here:
The first type of syntax is the shell type, as shown here:
ENTRYPOINT <command>
Here, <command> is the shell command, which is executed during the launch of the container. If this type of syntax is used, then the command is always executed by using /bin/sh -c.
The second type of syntax is exec or the JSON array, as shown here:
ENTRYPOINT ["<exec>", "<arg-1>", ..., "<arg-n>"]
Wherein, the code terms mean the following:
<exec>: This is the executable, which has to be run during the container launch time
<arg-1>, ..., <arg-n>: These are the variable (zero or more) number of arguments for the executable
Now, let's build a Docker image by using the docker build as the subcommand and entrypoint-demo as the image name. The docker build system would read the instruction from Dockerfile stored in the current directory (.) and craft the image, as shown here:
$sudo docker build -t entrypoint-demo .
Having built the image, we can launch the container by using the docker run subcommand:
$sudo docker run entrypoint-demo
Dockerfile ENTRYPOINT demo
Here, the container will run like an executable by echoing the Dockerfile ENTRYPOINT demo string and then it will exit immediately. If we pass any additional arguments to the docker run subcommand, then the additional argument would be passed to the entry point command. Following is the demonstration of launching the same image with the additional arguments given to the docker run subcommand:
$sudo docker run entrypoint-demo with additional arguments
Dockerfile ENTRYPOINT demo with additional arguments
Now, let's see an example where we override the build time entry point application with the--entrypoint option and then launch a shell (/bin/sh) in the docker run subcommand, as shown here:
$sudo docker run --entrypoint="/bin/sh" entrypoint-demo
/ #
The HEALTHCHECK instruction
Any Docker container is designed to run just one process / application / service as a best practice and also to be uniquely compatible for the fast-evolving Micro services Architecture (MSA) The container dies if the process running inside the container dies. There is a possibility that the application running inside the container might be in an unhealthy state and such state must be externalized for effective container management. Here the HEALTHCHECK instruction comes handy to monitor the health of the containerized application by running a health monitoring command (or tool) on a prescribed interval.
The syntax of the HEALTHCHECK instruction is as follows:
HEALTHCHECK[<options>] CMD <command>
Wherein, the code terms mean the following:
<command>: The health check command is to be executed on a prescribed interval. If the command exit status is 0, the container is considered to be in health state. If the command exit status is 1, the container is considered to be in unhealthy state.
<options>: By default the health check command is invoked every 30 seconds, the command timeout 30 seconds and the command is retried 3 times before the container is declared unhealthy. Optionally, you can modify the default interval, timeout and retries values using the following options:
--interval=<DURATION> [default: 30s]
--timeout=<DURATION> [default: 30s]
--retries=<N> [default: 3]
Here is an example for the HEALTHCHECK instruction:
HEALTHCHECK--interval=5m --timeout=3s
CMD curl -f http://localhost/ || exit 1
The ONBUILD instruction
The ONBUILD instruction registers a build instruction to an image and this gets triggered when another image is built by using this image as its base image. Any build instruction can be registered as a trigger and those instructions will be triggered immediately after the FROM instruction in the downstream Dockerfile. Thus, the ONBUILD instruction can be used for deferring the execution of the build instruction from the base image to the target image.
The syntax of the ONBUILD instruction is as follows:
ONBUILD <INSTRUCTION>
Wherein, <INSTRUCTION> is another Dockerfile build instruction, which will be triggered later. The ONBUILD instruction does not allow the chaining of another ONBUILD instruction. In addition, it does not allow the FROM, and MAINTAINER instruction as an ONBUILD trigger.
Here is an example for the ONBUILD instruction:
ONBUILD ADD config /etc/appconfig
The STOPSIGNAL instruction
The STOPSIGNAL instruction enables you to configure an exit signal for your container.
The STOPSIGNAL instruction has the following syntax:
STOPSIGNAL<signal>
Wherein, <signal>is either a valid signal name like SIGKILL or a valid unsigned signal number
The SHELL instruction
The SHELL instruction allows us to override the default shell, that is,sh on Linux and cmd on Windows.
The syntax of the SHELL instruction is as follows:
SHELL ["<shell>", "<arg-1>", ..., "<arg-n>"]
Wherein, the code terms mean the following:
<shell>: The shell to be used during container runtime
<arg-1>, ..., <arg-n>: These are the variables (zero or more) number of the arguments for the shell
Summary
Building the Docker images is a critical aspect of the Docker technology for streamlining the grueling journey of containerization. As indicated before, the Docker initiative has turned out to be disruptive and transformative for the containerization paradigm which has been present for a while now. Dockerfile is the most prominent one for producing the competent Docker images, which can be meticulously used across. We have illustrated all the commands, their syntax, and their usage techniques in order to empower you with all the easy-to-grasp details and this will simplify the image-building process for you. We have supplied a bevy of examples in order to substantiate the inner meaning of each command.
Resources for Article:
Further resources on this subject:
Benefits and Components of Docker [article]
Let's start with Extending Docker [article]
Container Linking and Docker DNS [article]
Read more