ICT is often described as a fast-growing industry. I think the best quality of the ICT industry is not related to its ability to grow at a super high speed, but to its ability to revolutionize itself and the rest of the world at an astonishing speed.
Every 10 to 15 years there are major shifts in how this industry works and every shift solves problems that were very hard to manage up to that point, creating new challenges. Also, at every major shift, many best practices of the previous iteration are classified as anti-patterns and new best practices are created. Although it might appear that those changes are impossible to predict, this is not always true. Obviously, it is not possible to know exactly what changes will occur and when they will take place, but looking at companies with a large number of servers and many lines of code usually reveals what the next steps will be.
The current shift has already happened in big companies like Amazon Web Services, Facebook, and Google. It is the implementation of IT automation systems to create and manage servers.
In this chapter we will cover:
IT automation
What is Ansible?
The secure shell
Installing Ansible
Creating a test environment with QEMU and KVM
Version control system
Using Ansible with Git
IT automation is in its larger sense—the processes and software that help with the management of the IT infrastructure (servers, networking, and storage). In the current shift, we are assisting to a huge implementation of such processes and software.
At the beginning of IT history, there were very few servers and a lot of people were needed to make them work properly, usually more than one person for each machine. Over the years, servers became more reliable and easier to manage so it was possible to have multiple servers managed by a single system administrator. In that period, the administrators manually installed the software, upgraded the software manually, and changed the configuration files manually. This was obviously a very labor-intensive and error-prone process, so many administrators started to implement scripts and other means to make their life easier. Those scripts were (usually) pretty complex and they did not scale very well.
In the early years of this century, data centers started to grow a lot due to companies' needs. Virtualization helped in keeping prices low and the fact that many of these services were web services, meant that many servers were very similar to each other. At this point, new tools were needed to substitute the scripts that were used before, the configuration management tools.
CFEngine was one of the first tools to demonstrate configuration management capabilities way back in the 1990s; more recently, there has been Puppet, Chef, and Salt, besides Ansible.
People often wonder if IT automation really brings enough advantages considering that implementing it has some direct and indirect costs. The main advantages of IT automation are:
Ability to provision machines quickly
Ability to recreate a machine from scratch in minutes
Ability to track any change performed on the infrastructure
For these reasons, it's possible to reduce the cost of managing the IT infrastructure by reducing the repetitive operations often performed by system administrators.
As with any other technology, IT automation does come with some disadvantages. From my point of view these are the biggest disadvantages:
Automating all of the small tasks that were once used to train new system administrators
If an error is performed, it will be propagated everywhere
The consequence of the first is that new ways to train junior system administrators will need to be implemented.
The second one is trickier. There are a lot of ways to limit this kind of damage, but none of those will prevent it completely. The following mitigation options are available:
Always have backups: Backups will not prevent you from nuking your machine; they will only make the restore process possible.
Always test your infrastructure code (playbooks/roles) in a non-production environment: Companies have developed different pipelines to deploy code and those usually include environments such as dev, test, staging, and production. Use the same pipeline to test your infrastructure code. If a buggy application reaches the production environment it could be a problem. If a buggy playbook reaches the production environment, it could be catastrophic.
Always peer-review your infrastructure code: Some companies have already introduced peer-reviews for the application code, but very few have introduced it for the infrastructure code. As I was saying in the previous point, I think infrastructure code is way more critical than application code, so you should always peer-review your infrastructure code, whether you do it for your application code or not.
Enable SELinux: SELinux is a security kernel module that is available on all Linux distributions (it is installed by default on Fedora, Red Hat Enterprise Linux, CentOS, Scientific Linux, and Unbreakable Linux). It allows you to limit users and process powers in a very granular way. I suggest using SELinux instead of other similar modules (such as AppArmor) because it is able to handle more situations and permissions. SELinux will prevent a huge amount of damage because, if correctly configured, it will prevent many dangerous commands from being executed.
Run the playbooks from a limited account: Even though user and privilege escalation schemes have been in UNIX code for more than 40 years, it seems as if not many companies use them. Using a limited user for all your playbooks, and escalating privileges only for commands that need higher privileges will help prevent you nuking a machine while trying to clean an application temporary folder.
Use horizontal privilege escalation: The
sudo
is a well-known command but is often used in its more dangerous form. Thesudo
command supports the '-u
' parameter that will allow you to specify a user that you want to impersonate. If you have to change a file that is owned by another user, please do not escalate toroot
to do so, just escalate to that user. In Ansible, you can use thebecome_user
parameter to achieve this.When possible, don't run a playbook on all your machines at the same time: Staged deployments can help you detect a problem before it's too late. There are many problems that are not detectable in a dev, test, staging, and qa environment. The majority of them are related to load that is hard to emulate properly in those non-production environments. A new configuration you have just added to your Apache HTTPd or MySQL servers could be perfectly OK from a syntax point of view, but disastrous for your specific application under your production load. A staged deployment will allow you to test your new configuration on your actual load without risking downtime if something was wrong.
Avoid guessing commands and modifiers: A lot of system administrators will try to remember the right parameter and try to guess if they don't remember it exactly. I've done it too, a lot of times, but this is very risky. Checking the man page or the online documentation will usually take you less than two minutes and often, by reading the manual, you'll find interesting notes you did not know. Guessing modifiers is dangerous because you could be fooled by a non-standard modifier (that is,
-v
is not the verbose mode forgrep
and-h
is not thehelp
command for the MySQL CLI).Avoid error-prone commands: Not all commands have been created equally. Some commands are (way) more dangerous than others. If you can assume a
cat
command safe, you have to assume that add
command is dangerous, sincedd
perform copies and conversion of files and volumes. I've seen people usingdd
in scripts to transform DOS files to UNIX (instead ofdos2unix
) and many other, very dangerous, examples. Please, avoid such commands, because they could result in a huge disaster if something goes wrong.Avoid unnecessary modifiers: If you need to delete a simple file, use
rm ${file}
notrm -rf ${file}
. The latter is often performed by users that have learned that; "to be sure, always userm -rf
", because at some time in their past, they have had to delete a folder. This will prevent you from deleting an entire folder if the${file}
variable is set wrongly.Always check what could happen if a variable is not set: If you want to delete the contents of a folder and you use the
rm -rf ${folder}/*
command, you are looking for trouble. If the${folder}
variable is not set for some reason, the shell will read arm -rf /*
command, which is deadly (considering the fact that therm -rf /
command will not work on the majority of current OSes because it requires a--no-preserve-root
option, whilerm -rf /*
will work as expected). I'm using this specific command as an example because I have seen such situations: the variable was pulled from a database which, due to some maintenance work, was down and an empty string was assigned to that variable. What happened next is probably easy to guess. In case you cannot prevent using variables in dangerous places, at least check them to see if they are not empty before using them. This will not save you from every problem but may catch some of the most common ones.Double check your redirections: Redirections (along with pipes) are the most powerful elements of Linux shells. They could also be very dangerous: a
cat /dev/rand > /dev/sda
command can destroy a disk even if acat
command is usually overlooked because it's not usually dangerous. Always double-check all commands that include a redirection.Use specific modules wherever possible: In this list I've used shell commands because many people will try to use Ansible as if it's just a way to distribute them: it's not. Ansible provides a lot of modules and we'll see them in this book. They will help you create more readable, portable, and safe playbooks.
There are a lot of ways to classify IT automation systems, but by far the most important is related to how the configurations are propagated. Based on this, we can distinguish between agent-based systems and agent-less systems.
Agent-based systems have two different components: a server and a client called agent.
There is only one server and it contains all of the configuration for your whole environment, while the agents are as many as the machines in the environment.
Note
In some cases, more than one server could be present to ensure high availability, but treat it as if it's a single server, since they will all be configured in the same way.
Periodically, client will contact the server to see if a new configuration for its machine is present. If a new configuration is present, the client will download it and apply it.
In agent-less systems, no specific agent is present. Agent-less systems do not always respect the server/client paradigm, since it's possible to have multiple servers and even the same number of servers and clients . Communications are initialized by the server that will contact the client(s) using standard protocols (usually via SSH and PowerShell).
Aside from the differences outlined above, there are other contrasting factors which arise because of those differences.
From a security standpoint, an agent-based system can be less secure. Since all machines have to be able to initiate a connection to the server machine, this machine could be attacked more easily than in an agent-less case where the machine is usually behind a firewall that will not accept any incoming connections.
From a performance point of view, agent-based systems run the risk of having the server saturated and therefore the roll-out could be slower. It also needs to be considered that, in a pure agent-based system, it is not possible to force-push an update immediately to a set of machines. It will have to wait until those machines check-in. For this reason, multiple agent-based systems have implemented out-of-bands wait to implement such feature. Tools such as Chef and Puppet are agent-based but can also run without a centralized server to scale a large number of machines, commonly called Serverless Chef and Masterless Puppet, respectively.
An agent-less system is easier to integrate in an infrastructure that is already present, since it will be seen by the clients as a normal SSH connection and therefore no additional configuration is needed.
Ansible is an agent-less IT automation tool developed in 2012 by Michael DeHaan, a former Red Hat associate. The Ansible design goals are for it to be: minimal, consistent, secure, highly reliable, and easy to learn. The Ansible company has recently been bought out by Red Hat and now operates as part of Red Hat, Inc.
Ansible primarily runs in push mode using SSH, but you can also run Ansible using ansible-pull
, where you can install Ansible on each agent, download the playbooks locally, and run them on individual machines. If there is a large number of machines (large is a relative term; in our view, greater than 500 and requiring parallel updates), and you plan to deploy updates to the machines in parallel, this might be the right way to go about it.
Secure Shell (also known as SSH) is a network service that allows you to login and access a shell remotely in a fully encrypted connection. The SSH daemon is today, the standard for UNIX system administration, after having replaced the unencrypted telnet. The most frequently used implementation of the SSH protocol is OpenSSH.
In the last few months, Microsoft has shown an implementation (at the time of writing) of OpenSSH for Windows.
Since Ansible performs SSH connections and commands in the same way any other SSH client would do, no specific configuration has been applied to the OpenSSH server.
To speed up default SSH connections, you can always enable ControlPersist
and the pipeline mode, which makes Ansible faster and secure.
We will try and compare Ansible with Puppet and Chef during the course of this book since many people have good experience with those tools. We will also point out specifically how Ansible would solve a problem compared to Chef or Puppet.
Ansible, as well as Puppet and Chef, are declarative in nature and are expected to move a machine to the desired state specified in the configuration. For example, in each of these tools, in order to start a service at a point in time and start it automatically on restart, you would need to write a declarative block or module; every time the tool runs on the machine, it will aspire to obtain the state defined in your playbook (Ansible), cookbook (Chef), or manifest (Puppet).
The difference in the toolset is minimal at a simple level but as more situations arise and the complexity increases, you will start finding differences between the different toolsets. In Puppet, you need to take care of the order, and the Puppet server will create the sequence of instructions to execute every time you run it on a different box. To exploit the power of Chef, you will need a good Ruby team. Your team needs to be good at the Ruby language to customize both Puppet and Chef, and there will be a bigger learning curve with both of the tools.
With Ansible, the case is different. It uses the simplicity of Chef when it comes to the order of execution, the top-to-bottom approach, and allows you to define the end state in YAML format, which makes the code extremely readable and easy for everyone, from development teams to operations teams, to pick up and make changes. In many cases, even without Ansible, operations teams are given playbook manuals to execute instructions from, whenever they face issues. Ansible mimics that behavior. Do not be surprised if you end up having your project manager change the code in Ansible and check it into Git because of its simplicity!
Installing Ansible is rather quick and simple. You can use the source code directly, by cloning it from the GitHub project (https://github.com/ansible/ansible), install it using your system's package manager, or use Python's package management tool (pip). You can use Ansible on any Windows, Mac, or UNIX-like system. Ansible doesn't require any databases and doesn't need any daemons running. This makes it easier to maintain Ansible versions and upgrade without any breaks.
We'd like to call the machine where we will install Ansible our Ansible workstation. Some people also refer to it as the command center.
It is possible to install Ansible using the system's package manager and in my opinion this is the preferred option if your system's package manager ships at least Ansible 2.0. We will look into installing Ansible via Yum, Apt, Homebrew, and pip.
If you are running a Fedora system you can install Ansible directly, since from Fedora 22, Ansible 2.0+ is available in the official repositories. You can install it as follows:
$ sudo dnf install ansible
For RHEL and RHEL-based (CentOS, Scientific Linux, Unbreakable Linux) systems, versions 6 and 7 have Ansible 2.0+ available in the EPEL repository, so you should ensure that you have the EPEL repository enabled before installing Ansible as follows:
$ sudo yum install ansible
Note
On Cent 6 or RHEL 6, you have to run the command rpm -Uvh
. Refer to http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm for instructions on how to install EPEL.
Ansible is available for Ubuntu and Debian. To install Ansible on those operating systems, use the following command:
$ sudo apt-get install ansible
You can install Ansible on Mac OS X using Homebrew, as follows:
$ brew update $ brew install ansible
You can install Ansible via pip. If you don't have pip installed on your system, install it. You can use pip to install Ansible on Windows too, using the following command line:
$ sudo easy_install pip
You can now install Ansible using pip
, as follows:
$ sudo pip install ansible
Once you're done installing Ansible, run ansible --version
to verify that it has been installed:
$ ansible --version
You will get the following output from the preceding command line:
ansible 2.0.2
In case the previous methods do not fit your use case, you can install Ansible directly from the source. Installing from source does not require any root permissions. Let's clone a repository and activate virtualenv
, which is an isolated environment in Python where you can install packages without interfering with the system's Python packages. The command and the resulting output for the repository is as follows:
$ git clone git://github.com/ansible/ansible.git Cloning into 'ansible'... remote: Counting objects: 116403, done. remote: Compressing objects: 100% (18/18), done. remote: Total 116403 (delta 3), reused 0 (delta 0), pack-reused 116384 Receiving objects: 100% (116403/116403), 40.80 MiB | 844.00 KiB/s, done. Resolving deltas: 100% (69450/69450), done. Checking connectivity... done. $ cd ansible/ $ source ./hacking/env-setup Setting up Ansible to run out of checkout... PATH=/home/vagrant/ansible/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/vagrant/bin PYTHONPATH=/home/vagrant/ansible/lib: MANPATH=/home/vagrant/ansible/docs/man: Remember, you may wish to specify your host file with -i Done!
Ansible needs a couple of Python packages, which you can install using pip
. If you don't have pip installed on your system, install it using the following command. If you don't have easy_install
installed, you can install it using Python's setuptools
package on Red Hat systems, or by using Brew on the Mac:
$ sudo easy_install pip <A long output follows>
Once you have installed pip
, install the paramiko
, PyYAML
, jinja2
, and httplib2
packages using the following command lines:
$ sudo pip install paramiko PyYAML jinja2 httplib2 Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.6/site-packages Requirement already satisfied (use --upgrade to upgrade): PyYAML in /usr/lib64/python2.6/site-packages Requirement already satisfied (use --upgrade to upgrade): jinja2 in /usr/lib/python2.6/site-packages Requirement already satisfied (use --upgrade to upgrade): httplib2 in /usr/lib/python2.6/site-packages Downloading/unpacking markupsafe (from jinja2) Downloading MarkupSafe-0.23.tar.gz Running setup.py (path:/tmp/pip_build_root/markupsafe/setup.py) egg_info for package markupsafe Installing collected packages: markupsafe Running setup.py install for markupsafe building 'markupsafe._speedups' extension gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.6 -c markupsafe/_speedups.c -o build/temp.linux-x86_64-2.6/markupsafe/_speedups.o gcc -pthread -shared build/temp.linux-x86_64-2.6/markupsafe/_speedups.o -L/usr/lib64 -lpython2.6 -o build/lib.linux-x86_64-2.6/markupsafe/_speedups.so Successfully installed markupsafe Cleaning up...
Note
By default, Ansible will be running against the development branch. You might want to check out the latest stable branch. Check what the latest stable version is using the following command line:
$ git branch -a
Copy the latest version you want to use. Version 2.0.2 was the latest version available at the time of writing. Check the latest version using the following command lines:
[node ansible]$ git checkout v2.0.2 Note: checking out 'v2.0.2'. [node ansible]$ ansible --version ansible 2.0.2 (v2.0.2 268e72318f) last updated 2014/09/28 21:27:25 (GMT +000)
You now have a working setup of Ansible ready. One of the benefits of running Ansible from source is that you can enjoy the new features immediately, without waiting for your package manager to make them available for you.
To be able to learn Ansible, we will need to make quite a few playbooks and run them.
Tip
Doing it directly on your computer will be very risky. For this reason, I would suggest using virtual machines.
It's possible to create a test environment with cloud providers in a few seconds, but often it is more useful to have those machines locally. To do so, we will use Kernel-based Virtual Machine (KVM) with Quick Emulator (QEMU).
The first thing will be installing qemu-kvm
and virt-install
. On Fedora it will be enough to run:
$ sudo dnf install -y @virtualization
On Red Hat/CentOS/Scientific Linux/Unbreakable Linux it will be enough to run:
$ sudo yum install -y qemu-kvm virt-install virt-manager
If you use Ubuntu, you can install it using:
$ sudo apt install virt-manager
On Debian, you'll need to execute:
$ sudo apt install qemu-kvm libvirt-bin
For our examples, I'll be using CentOS 7. This is for multiple reasons; the main ones are:
CentOS is free and 100% compatible with Red Hat, Scientific Linux, and Unbreakable Linux
Many companies use Red Hat/CentOS/Scientific Linux/Unbreakable Linux for their servers
Those distributions are the only ones with SELinux support built in, and as we have seen earlier, SELinux can help you make your environment much more secure
At the time of writing this book, the most recent CentOS cloud image is http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1603.qcow2, So let's download this image with the help of the following command:
$ wget http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1603.qcow2
Since we will probably need to create many machines, it's better if we create a copy of it so the original one will not be modified:
$ cp CentOS-7-x86_64-GenericCloud-1603.qcow2 centos_1.qcow2
Since the qcow2
images will run cloud-init
to set up the networking, users, and so on, we will need to provide a couple of files. Let's start by creating a metadata file for networking:
instance-id: centos_1 local-hostname: centos_1.local network-interfaces: | iface eth0 inet static address (An IP in your virtual bridge class) network (The first IP of the virtual bridge class) netmask (Your virtual bridge class netmask) broadcast (Your virtual bridge class broadcast) gateway (Your virtual bridge class gateway)
To find your virtual bridge data, you have to look for a device that has the name virbrX
or something similar, in my case it is virtbr0
, so I can find all of its information using the following command:
$ ip addr show virbr0
The previous command will give this as an output:
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:38:1a:e6 brd ff:ff:ff:ff:ff:ff inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0 valid_lft forever preferred_lft forever
So, for me the meta-data file looks like the following:
instance-id: centos_1 local-hostname: centos_1.local network-interfaces: | iface eth0 inet static address 192.168.124.10 network 192.168.124.1 netmask 255.255.255.0 broadcast 192.168.124.255 gateway 192.168.124.1
This file will set up the eth0
interface of the virtual machine at boot time. We also need another file (user-data) to set up the users
properly:
users: - name: (yourname) shell: /bin/bash sudo: ['ALL=(ALL) NOPASSWD:ALL'] ssh-authorized-keys: - (insert ssh public key here)
For me, the file looks like the following:
users: - name: fale shell: /bin/bash sudo: ['ALL=(ALL) NOPASSWD:ALL'] ssh-authorized-keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDRoZzfNif+wXFqzsmvHg4jJt8+ZO/dQxm5k7pXYAwdWVbiFrZYGhMQl5FPfzC7rkDaC31fod3Y85QkQVgNKCVYUy5QR5LfxUjSQDv+y2Nfao4be/BKla0ffc7JVSzFFAELGGDLn1lMN0e0D9syqQbKgSRdOdvweq/0Et3KNIF9e7XgEdSuAHls17NDtMkWUfyi5yvEtdtMcp9gO4OlG6Vh0iCXOdx+f0QA2hh1JnvePvzJ4a8CeckN5JwL7Q027nlsHPBYq9K1jvv+diUs48FflPJI4fgMq3Zo7zyCpf8qE7Dlx+u7OvR5kxNdrpnOsDgHeAGNkrzfcmxU7kbU29NX4VFgWd0sdlzu1nOWFEH7Cnd547tx5VFxBzJwEAUCh7QSiU2Ne/hCnjFkZuDZ5pN4pNw+yu+Feoz79gV/utoLHuCodYyAvSQlQ7VSfC+djLD/9wHC2yGksvc9ICnSUv3JyQEEEG4K26z6szF9+a3vU0qIq7YYa8QHgWIHtzSxztYRIWJOzTZlwyuNmhbRNYDaMC5BMzvQ8JREv0obMLmrlvolJPWT4gn1N9sDNNXIC6RDRE5yGsIEf0CliYW1X/8XG40U+g9LG+lrYOGWD4OymZ2P/VDIzZbVT6NG/rdSSGnf4D1AwlOGR7eNTv30AK9o0LVjqGaJWKWYUF9zY6I3+Q==
To provide those files at boot time, we will need to create an ISO file containing them:
$ genisoimage -output centos_1.iso -volid cidata -joliet -rock user-data meta-data
After the ISO file is ready, we can instruct virt-install
to actually create the virtual machine:
virt-install --name CentOS_1 \ --ram 2048 \ --disk centos_1.qcow2 \ --vcpus 2 \ --os-variant fedora21 \ --connect qemu:///system \ --network bridge:br0,model=virtio \ --cdrom centos_1.iso \ --boot hd virt-install --name CentOS_1 \ --ram 2048 \ --disk centos_1.qcow2 \ --vcpus 2 \ --os-variant fedora21 \ --connect qemu:///system \ --network bridge:br0,model=virtio \ --cdrom centos_1.iso \ --boot hd
Since our network configuration is in the ISO file, we will need it at every boot. Sadly, by default this does not happen, so we will need to do a few more steps. Firstly, run virsh
:
$ virsh
At this point, a virsh
shell should appear with an output like the following:
Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh #
This means that we switched from bash (or your shell, if you are not using bash) to the virtualization shell. Issue the following command:
virsh # edit CentOS_1
By doing this we will be able to tweak the configuration of the CentOS_1
machine. In the disk section, you'll need to find the cdrom
device that should look like this:
<disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hda' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
You'll need to change it to the following as highlighted in bold:
<disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='(Put here your ISO path)/centos_1.iso'/> <target dev='hda' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
At this point, our virtual machine will always start with the ISO file mounted as a cdrom
and therefore cloud-init
will be able to correctly initiate the networking.
In this chapter, we have already encountered the expression infrastructure code to describe the Ansible code that will create and maintain your infrastructure. We use the expression infrastructure code to distinguish it from the application code, which is the code that composes your applications, websites, and so on. This distinction is needed for clarity, but in the end, both types are a bunch of text files that the software will be able to read and interpret.
For this reason, a version control system will help you a lot. Its main advantages are:
Ability to have multiple people working simultaneously on the same project.
Ability to perform code reviews in a simple way.
Ability to have multiple branches for multiple environments (that is, dev, test, qa, staging, and production).
Ability to track a change so we know when it was introduced, and who introduced it. This makes it easier to understand why that piece of code is there, years (or months) later.
Those advantages are provided to you by the majority of version control systems out there.
Version control systems can be divided into three major groups based on the three different models that they can implement:
Local data model
Client-server model
Distributed model
The first category, the local data model, is the oldest (circa 1972) approach and is used for very specific use cases. This model requires all users to share the same filesystem. Famous examples of it are the Revision Control System (RCS) and Source Code Control System (SCCS).
The second category, the client-server model, arrived later (circa 1990) and tried to solve the limitations of the local data model, creating a server that respected the local data model and a set of clients that dealt with the server instead of with the repository itself. This additional layer allowed multiple developers to use local files and synchronize them with a centralized server. Famous examples of this approach are Apache Subversion (SVN), and Concurrent Versions System (CVS).
The third category, the distributed model, arrived at the beginning of the twenty-first century and tried to solve the limitations of the client-server model. In fact, in the client-server mode, you could work on the code offline, but you needed to be online to commit the changes. The distributed model allows you to handle everything on your local repository (like the local data model), and to merge different repositories on different machines in an easy way. In this new model, it's possible to perform all actions as in the client-server model, with the added benefits of being able to work completely offline as well as the ability to merge changes between peers without passing by the centralized server. Examples of this model are BitKeeper (proprietary software), Git, GNU Bazaar, and Mercurial.
There are some additional advantages that will be provided by only the distributed model, such as:
Possibility of making commits, browsing history, and performing any other action even if the server is not available
Easier management of multiple branches for different environments
When it comes to infrastructure code, we have to consider that, frequently, the infrastructure that retains and manages your infrastructure code is kept in the infrastructure code itself. This is a recursive situation that can create problems. In fact, until you have your code server in place you cannot deploy your Ansible, and until you have your Ansible in place, you cannot deploy your code server. A distributed version control system will prevent this problem.
As for the simplicity of managing multiple branches, even if this is not a hard rule, often distributed version control systems have much better merge handling than the other kinds of version control systems.
For the reasons that we have just seen and because of its huge popularity, I suggest always using Git for your Ansible repositories.
There are a few suggestions that I always provide to the people I talk to, so Ansible gets the best out of Git:
Create environment branches: Creating environment branches such as dev, prod, test, and stg, will allow you to easily keep track of the different environments and their respective update statuses. I often suggest keeping the master branch for the development environment, since I find many people are used to pushing new changes directly to the master. If you use a master for a production environment, people can inadvertently push changes in the production environment while they wanted to push them in a development environment.
Always keep environment branches stable: One of the big advantages of having environment branches is the possibility of destroying and recreating any environment from scratch at any given moment. This is only possible if your environment branches are in a stable (not broken) state.
Use feature branches: Using different branches for specific long-development features (such as a refactor or some other big changes) will allow you to keep your day-to-day operations while your new feature is in the Git repository (so you'll not lose track of who did what and when they did it).
Push often: I always suggest that people push commits as often as possible. This will make Git work as both a version control system and a backup system. I have seen laptops broken, lost, or stolen with days or weeks of unpushed work on them far too often. Don't waste your time, push often. Also, by pushing often, you'll detect merge conflicts sooner, and conflicts are always easier to handle when they are detected early, instead of waiting for multiple changes.
Always deploy after you have made a change: I have seen times when a developer has created a change in the infrastructure code, tested in the dev and test environments, pushed to the production branch, and then went to have lunch before deploying the changes in production. His lunch did not end well. One of his colleagues deployed the code to production inadvertently (he was trying to deploy a small change he had made in the meantime) and was not prepared to handle the other developer's deployment. The production infrastructure broke and they lost a lot of time figuring out how it was possible that such a small change (the one the person who made the deployment was aware of) created such a big mess.
Choose multiple small changes rather than a few huge changes: Making small changes, whenever possible, will make debugging easier. Debugging an infrastructure is not very easy. There is no compiler that will allow you to see obvious problems (even though Ansible performs a syntax check of your code, no other test is performed), and the tools for finding something that is broken are not always as good as you would imagine. The infrastructure as a code paradigm is new and tools are not yet as good as the ones for the application code.
Avoid binary files as much as possible: I always suggest keeping your binaries outside your Git repository, whether it is an application code repository or an infrastructure code repository. In the application code example, I think it is important to keep your repository light (Git as well as the majority of the version control systems, do not perform very well with binary blobs), while for the infrastructure code example, it is vital because you'll be tempted to put a huge number of binary blobs in it, since very often it is easier to put a binary blob in the repository than to find a cleaner (and better) solution.
In this chapter, we have seen what IT automation is, it's advantages, disadvantages, what kind of tools you can find, and how Ansible fits into this big picture. We have also seen how to install Ansible and how to create a KVM-based virtual machine. In the end, we analyzed the version control systems and spoke about the advantages Git brings to Ansible if used properly.
In the next chapter, we will start looking at the infrastructure code that we mentioned in this chapter without explaining exactly what it is and how to write it. Also in the next chapter, we'll see how to automate simple operations that you probably perform every single day, such as managing users, managing files, and file content.