Learning Ansible 2 - Second Edition

4.5 (2 reviews total)
By Fabio Alessandro Locati
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Getting Started with Ansible

About this book

Ansible is an open source automation platform that assists organizations with tasks such as configuration management, application deployment, orchestration, and task automation. With Ansible, even complex tasks can be handled easier than before.

In this book, you will learn about the fundamentals and practical aspects of Ansible 2 by diving deeply into topics such as installation (Linux, BSD, and Windows Support), playbooks, modules, various testing strategies, provisioning, deployment, and orchestration. In this book, you will get accustomed with the new features of Ansible 2 such as cleaner architecture, task blocks, playbook parsing, new execution strategy plugins, and modules. You will also learn how to integrate Ansible with cloud platforms such as AWS. The book ends with the enterprise versions of Ansible, Ansible Tower and Ansible Galaxy, where you will learn to interact Ansible with different OSes to speed up your work to previously unseen levels

By the end of the book, you’ll able to leverage the Ansible parameters to create expeditious tasks for your organization by implementing the Ansible 2 techniques and paradigms.

Publication date:
November 2016
Publisher
Packt
Pages
266
ISBN
9781786464231

 

Chapter 1. Getting Started with Ansible

ICT is often described as a fast-growing industry. I think the best quality of the ICT industry is not related to its ability to grow at a super high speed, but to its ability to revolutionize itself and the rest of the world at an astonishing speed.

Every 10 to 15 years there are major shifts in how this industry works and every shift solves problems that were very hard to manage up to that point, creating new challenges. Also, at every major shift, many best practices of the previous iteration are classified as anti-patterns and new best practices are created. Although it might appear that those changes are impossible to predict, this is not always true. Obviously, it is not possible to know exactly what changes will occur and when they will take place, but looking at companies with a large number of servers and many lines of code usually reveals what the next steps will be.

The current shift has already happened in big companies like Amazon Web Services, Facebook, and Google. It is the implementation of IT automation systems to create and manage servers.

In this chapter we will cover:

  • IT automation

  • What is Ansible?

  • The secure shell

  • Installing Ansible

  • Creating a test environment with QEMU and KVM

  • Version control system

  • Using Ansible with Git

 

IT automation


IT automation is in its larger sense—the processes and software that help with the management of the IT infrastructure (servers, networking, and storage). In the current shift, we are assisting to a huge implementation of such processes and software.

The history of IT automation

At the beginning of IT history, there were very few servers and a lot of people were needed to make them work properly, usually more than one person for each machine. Over the years, servers became more reliable and easier to manage so it was possible to have multiple servers managed by a single system administrator. In that period, the administrators manually installed the software, upgraded the software manually, and changed the configuration files manually. This was obviously a very labor-intensive and error-prone process, so many administrators started to implement scripts and other means to make their life easier. Those scripts were (usually) pretty complex and they did not scale very well.

In the early years of this century, data centers started to grow a lot due to companies' needs. Virtualization helped in keeping prices low and the fact that many of these services were web services, meant that many servers were very similar to each other. At this point, new tools were needed to substitute the scripts that were used before, the configuration management tools.

CFEngine was one of the first tools to demonstrate configuration management capabilities way back in the 1990s; more recently, there has been Puppet, Chef, and Salt, besides Ansible.

Advantages of IT automation

People often wonder if IT automation really brings enough advantages considering that implementing it has some direct and indirect costs. The main advantages of IT automation are:

  • Ability to provision machines quickly

  • Ability to recreate a machine from scratch in minutes

  • Ability to track any change performed on the infrastructure

For these reasons, it's possible to reduce the cost of managing the IT infrastructure by reducing the repetitive operations often performed by system administrators.

Disadvantages of IT automation

As with any other technology, IT automation does come with some disadvantages. From my point of view these are the biggest disadvantages:

  • Automating all of the small tasks that were once used to train new system administrators

  • If an error is performed, it will be propagated everywhere

The consequence of the first is that new ways to train junior system administrators will need to be implemented.

Limiting the possible damages of an error propagation

The second one is trickier. There are a lot of ways to limit this kind of damage, but none of those will prevent it completely. The following mitigation options are available:

  • Always have backups: Backups will not prevent you from nuking your machine; they will only make the restore process possible.

  • Always test your infrastructure code (playbooks/roles) in a non-production environment: Companies have developed different pipelines to deploy code and those usually include environments such as dev, test, staging, and production. Use the same pipeline to test your infrastructure code. If a buggy application reaches the production environment it could be a problem. If a buggy playbook reaches the production environment, it could be catastrophic.

  • Always peer-review your infrastructure code: Some companies have already introduced peer-reviews for the application code, but very few have introduced it for the infrastructure code. As I was saying in the previous point, I think infrastructure code is way more critical than application code, so you should always peer-review your infrastructure code, whether you do it for your application code or not.

  • Enable SELinux: SELinux is a security kernel module that is available on all Linux distributions (it is installed by default on Fedora, Red Hat Enterprise Linux, CentOS, Scientific Linux, and Unbreakable Linux). It allows you to limit users and process powers in a very granular way. I suggest using SELinux instead of other similar modules (such as AppArmor) because it is able to handle more situations and permissions. SELinux will prevent a huge amount of damage because, if correctly configured, it will prevent many dangerous commands from being executed.

  • Run the playbooks from a limited account: Even though user and privilege escalation schemes have been in UNIX code for more than 40 years, it seems as if not many companies use them. Using a limited user for all your playbooks, and escalating privileges only for commands that need higher privileges will help prevent you nuking a machine while trying to clean an application temporary folder.

  • Use horizontal privilege escalation: The sudo is a well-known command but is often used in its more dangerous form. The sudo command supports the '-u' parameter that will allow you to specify a user that you want to impersonate. If you have to change a file that is owned by another user, please do not escalate to root to do so, just escalate to that user. In Ansible, you can use the become_user parameter to achieve this.

  • When possible, don't run a playbook on all your machines at the same time: Staged deployments can help you detect a problem before it's too late. There are many problems that are not detectable in a dev, test, staging, and qa environment. The majority of them are related to load that is hard to emulate properly in those non-production environments. A new configuration you have just added to your Apache HTTPd or MySQL servers could be perfectly OK from a syntax point of view, but disastrous for your specific application under your production load. A staged deployment will allow you to test your new configuration on your actual load without risking downtime if something was wrong.

  • Avoid guessing commands and modifiers: A lot of system administrators will try to remember the right parameter and try to guess if they don't remember it exactly. I've done it too, a lot of times, but this is very risky. Checking the man page or the online documentation will usually take you less than two minutes and often, by reading the manual, you'll find interesting notes you did not know. Guessing modifiers is dangerous because you could be fooled by a non-standard modifier (that is, -v is not the verbose mode for grep and -h is not the help command for the MySQL CLI).

  • Avoid error-prone commands: Not all commands have been created equally. Some commands are (way) more dangerous than others. If you can assume a cat command safe, you have to assume that a dd command is dangerous, since dd perform copies and conversion of files and volumes. I've seen people using dd in scripts to transform DOS files to UNIX (instead of dos2unix) and many other, very dangerous, examples. Please, avoid such commands, because they could result in a huge disaster if something goes wrong.

  • Avoid unnecessary modifiers: If you need to delete a simple file, use rm ${file} not rm -rf ${file}. The latter is often performed by users that have learned that; "to be sure, always use rm -rf", because at some time in their past, they have had to delete a folder. This will prevent you from deleting an entire folder if the ${file} variable is set wrongly.

  • Always check what could happen if a variable is not set: If you want to delete the contents of a folder and you use the rm -rf ${folder}/* command, you are looking for trouble. If the ${folder} variable is not set for some reason, the shell will read a rm -rf /* command, which is deadly (considering the fact that the rm -rf / command will not work on the majority of current OSes because it requires a --no-preserve-root option, while rm -rf /* will work as expected). I'm using this specific command as an example because I have seen such situations: the variable was pulled from a database which, due to some maintenance work, was down and an empty string was assigned to that variable. What happened next is probably easy to guess. In case you cannot prevent using variables in dangerous places, at least check them to see if they are not empty before using them. This will not save you from every problem but may catch some of the most common ones.

  • Double check your redirections: Redirections (along with pipes) are the most powerful elements of Linux shells. They could also be very dangerous: a cat /dev/rand > /dev/sda command can destroy a disk even if a cat command is usually overlooked because it's not usually dangerous. Always double-check all commands that include a redirection.

  • Use specific modules wherever possible: In this list I've used shell commands because many people will try to use Ansible as if it's just a way to distribute them: it's not. Ansible provides a lot of modules and we'll see them in this book. They will help you create more readable, portable, and safe playbooks.

Types of IT automation

There are a lot of ways to classify IT automation systems, but by far the most important is related to how the configurations are propagated. Based on this, we can distinguish between agent-based systems and agent-less systems.

Agent-based systems

Agent-based systems have two different components: a server and a client called agent.

There is only one server and it contains all of the configuration for your whole environment, while the agents are as many as the machines in the environment.

Note

In some cases, more than one server could be present to ensure high availability, but treat it as if it's a single server, since they will all be configured in the same way.

Periodically, client will contact the server to see if a new configuration for its machine is present. If a new configuration is present, the client will download it and apply it.

Agent-less systems

In agent-less systems, no specific agent is present. Agent-less systems do not always respect the server/client paradigm, since it's possible to have multiple servers and even the same number of servers and clients . Communications are initialized by the server that will contact the client(s) using standard protocols (usually via SSH and PowerShell).

Agent-based versus Agent-less systems

Aside from the differences outlined above, there are other contrasting factors which arise because of those differences.

From a security standpoint, an agent-based system can be less secure. Since all machines have to be able to initiate a connection to the server machine, this machine could be attacked more easily than in an agent-less case where the machine is usually behind a firewall that will not accept any incoming connections.

From a performance point of view, agent-based systems run the risk of having the server saturated and therefore the roll-out could be slower. It also needs to be considered that, in a pure agent-based system, it is not possible to force-push an update immediately to a set of machines. It will have to wait until those machines check-in. For this reason, multiple agent-based systems have implemented out-of-bands wait to implement such feature. Tools such as Chef and Puppet are agent-based but can also run without a centralized server to scale a large number of machines, commonly called Serverless Chef and Masterless Puppet, respectively.

An agent-less system is easier to integrate in an infrastructure that is already present, since it will be seen by the clients as a normal SSH connection and therefore no additional configuration is needed.

 

What is Ansible?


Ansible is an agent-less IT automation tool developed in 2012 by Michael DeHaan, a former Red Hat associate. The Ansible design goals are for it to be: minimal, consistent, secure, highly reliable, and easy to learn. The Ansible company has recently been bought out by Red Hat and now operates as part of Red Hat, Inc.

Ansible primarily runs in push mode using SSH, but you can also run Ansible using ansible-pull, where you can install Ansible on each agent, download the playbooks locally, and run them on individual machines. If there is a large number of machines (large is a relative term; in our view, greater than 500 and requiring parallel updates), and you plan to deploy updates to the machines in parallel, this might be the right way to go about it.

 

Secure Shell (SSH)


Secure Shell (also known as SSH) is a network service that allows you to login and access a shell remotely in a fully encrypted connection. The SSH daemon is today, the standard for UNIX system administration, after having replaced the unencrypted telnet. The most frequently used implementation of the SSH protocol is OpenSSH.

In the last few months, Microsoft has shown an implementation (at the time of writing) of OpenSSH for Windows.

Since Ansible performs SSH connections and commands in the same way any other SSH client would do, no specific configuration has been applied to the OpenSSH server.

To speed up default SSH connections, you can always enable ControlPersist and the pipeline mode, which makes Ansible faster and secure.

 

Why Ansible?


We will try and compare Ansible with Puppet and Chef during the course of this book since many people have good experience with those tools. We will also point out specifically how Ansible would solve a problem compared to Chef or Puppet.

Ansible, as well as Puppet and Chef, are declarative in nature and are expected to move a machine to the desired state specified in the configuration. For example, in each of these tools, in order to start a service at a point in time and start it automatically on restart, you would need to write a declarative block or module; every time the tool runs on the machine, it will aspire to obtain the state defined in your playbook (Ansible), cookbook (Chef), or manifest (Puppet).

The difference in the toolset is minimal at a simple level but as more situations arise and the complexity increases, you will start finding differences between the different toolsets. In Puppet, you need to take care of the order, and the Puppet server will create the sequence of instructions to execute every time you run it on a different box. To exploit the power of Chef, you will need a good Ruby team. Your team needs to be good at the Ruby language to customize both Puppet and Chef, and there will be a bigger learning curve with both of the tools.

With Ansible, the case is different. It uses the simplicity of Chef when it comes to the order of execution, the top-to-bottom approach, and allows you to define the end state in YAML format, which makes the code extremely readable and easy for everyone, from development teams to operations teams, to pick up and make changes. In many cases, even without Ansible, operations teams are given playbook manuals to execute instructions from, whenever they face issues. Ansible mimics that behavior. Do not be surprised if you end up having your project manager change the code in Ansible and check it into Git because of its simplicity!

 

Installing Ansible


Installing Ansible is rather quick and simple. You can use the source code directly, by cloning it from the GitHub project (https://github.com/ansible/ansible), install it using your system's package manager, or use Python's package management tool (pip). You can use Ansible on any Windows, Mac, or UNIX-like system. Ansible doesn't require any databases and doesn't need any daemons running. This makes it easier to maintain Ansible versions and upgrade without any breaks.

We'd like to call the machine where we will install Ansible our Ansible workstation. Some people also refer to it as the command center.

Installing Ansible using the system's package manager

It is possible to install Ansible using the system's package manager and in my opinion this is the preferred option if your system's package manager ships at least Ansible 2.0. We will look into installing Ansible via Yum, Apt, Homebrew, and pip.

Installing via Yum

If you are running a Fedora system you can install Ansible directly, since from Fedora 22, Ansible 2.0+ is available in the official repositories. You can install it as follows:

$ sudo dnf install ansible

For RHEL and RHEL-based (CentOS, Scientific Linux, Unbreakable Linux) systems, versions 6 and 7 have Ansible 2.0+ available in the EPEL repository, so you should ensure that you have the EPEL repository enabled before installing Ansible as follows:

$ sudo yum install ansible

Note

On Cent 6 or RHEL 6, you have to run the command rpm -Uvh. Refer to http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm for instructions on how to install EPEL.

Installing via Apt

Ansible is available for Ubuntu and Debian. To install Ansible on those operating systems, use the following command:

$ sudo apt-get install ansible

Installing via Homebrew

You can install Ansible on Mac OS X using Homebrew, as follows:

$ brew update
$ brew install ansible

Installing via pip

You can install Ansible via pip. If you don't have pip installed on your system, install it. You can use pip to install Ansible on Windows too, using the following command line:

$ sudo easy_install pip

You can now install Ansible using pip, as follows:

$ sudo pip install ansible

Once you're done installing Ansible, run ansible --version to verify that it has been installed:

$ ansible --version

You will get the following output from the preceding command line:

ansible 2.0.2

Installing Ansible from source

In case the previous methods do not fit your use case, you can install Ansible directly from the source. Installing from source does not require any root permissions. Let's clone a repository and activate virtualenv, which is an isolated environment in Python where you can install packages without interfering with the system's Python packages. The command and the resulting output for the repository is as follows:

$ git clone git://github.com/ansible/ansible.git
Cloning into 'ansible'...
remote: Counting objects: 116403, done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 116403 (delta 3), reused 0 (delta 0), pack-reused 116384
Receiving objects: 100% (116403/116403), 40.80 MiB | 844.00 KiB/s, done.
Resolving deltas: 100% (69450/69450), done.
Checking connectivity... done.
$ cd ansible/
$ source ./hacking/env-setup
Setting up Ansible to run out of checkout...
PATH=/home/vagrant/ansible/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/vagrant/bin
PYTHONPATH=/home/vagrant/ansible/lib:
MANPATH=/home/vagrant/ansible/docs/man:
Remember, you may wish to specify your host file with -i
Done!

Ansible needs a couple of Python packages, which you can install using pip. If you don't have pip installed on your system, install it using the following command. If you don't have easy_install installed, you can install it using Python's setuptools package on Red Hat systems, or by using Brew on the Mac:

$ sudo easy_install pip
<A long output follows>

Once you have installed pip, install the paramiko, PyYAML, jinja2, and httplib2 packages using the following command lines:

$ sudo pip install paramiko PyYAML jinja2 httplib2
Requirement already satisfied (use --upgrade to upgrade): paramiko in /usr/lib/python2.6/site-packages
Requirement already satisfied (use --upgrade to upgrade): PyYAML in /usr/lib64/python2.6/site-packages
Requirement already satisfied (use --upgrade to upgrade): jinja2 in /usr/lib/python2.6/site-packages
Requirement already satisfied (use --upgrade to upgrade): httplib2 in /usr/lib/python2.6/site-packages
Downloading/unpacking markupsafe (from jinja2)
  Downloading MarkupSafe-0.23.tar.gz
  Running setup.py (path:/tmp/pip_build_root/markupsafe/setup.py) egg_info for package markupsafe
Installing collected packages: markupsafe
  Running setup.py install for markupsafe
    building 'markupsafe._speedups' extension
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.6 -c markupsafe/_speedups.c -o build/temp.linux-x86_64-2.6/markupsafe/_speedups.o
    gcc -pthread -shared build/temp.linux-x86_64-2.6/markupsafe/_speedups.o -L/usr/lib64 -lpython2.6 -o build/lib.linux-x86_64-2.6/markupsafe/_speedups.so
Successfully installed markupsafe
Cleaning up...

Note

By default, Ansible will be running against the development branch. You might want to check out the latest stable branch. Check what the latest stable version is using the following command line:

$ git branch -a

Copy the latest version you want to use. Version 2.0.2 was the latest version available at the time of writing. Check the latest version using the following command lines:

[node ansible]$ git checkout v2.0.2
Note: checking out 'v2.0.2'.
[node ansible]$ ansible --version
ansible 2.0.2 (v2.0.2 268e72318f) last updated 2014/09/28 21:27:25 (GMT +000)

You now have a working setup of Ansible ready. One of the benefits of running Ansible from source is that you can enjoy the new features immediately, without waiting for your package manager to make them available for you.

 

Creating a test environment with QEMU and KVM


To be able to learn Ansible, we will need to make quite a few playbooks and run them.

Tip

Doing it directly on your computer will be very risky. For this reason, I would suggest using virtual machines.

It's possible to create a test environment with cloud providers in a few seconds, but often it is more useful to have those machines locally. To do so, we will use Kernel-based Virtual Machine (KVM) with Quick Emulator (QEMU).

The first thing will be installing qemu-kvm and virt-install. On Fedora it will be enough to run:

$ sudo dnf install -y @virtualization

On Red Hat/CentOS/Scientific Linux/Unbreakable Linux it will be enough to run:

$ sudo yum install -y qemu-kvm virt-install virt-manager

If you use Ubuntu, you can install it using:

$ sudo apt install virt-manager

On Debian, you'll need to execute:

$ sudo apt install qemu-kvm libvirt-bin

For our examples, I'll be using CentOS 7. This is for multiple reasons; the main ones are:

  • CentOS is free and 100% compatible with Red Hat, Scientific Linux, and Unbreakable Linux

  • Many companies use Red Hat/CentOS/Scientific Linux/Unbreakable Linux for their servers

  • Those distributions are the only ones with SELinux support built in, and as we have seen earlier, SELinux can help you make your environment much more secure

At the time of writing this book, the most recent CentOS cloud image is http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1603.qcow2, So let's download this image with the help of the following command:

$ wget http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1603.qcow2

Since we will probably need to create many machines, it's better if we create a copy of it so the original one will not be modified:

$ cp CentOS-7-x86_64-GenericCloud-1603.qcow2 centos_1.qcow2

Since the qcow2 images will run cloud-init to set up the networking, users, and so on, we will need to provide a couple of files. Let's start by creating a metadata file for networking:

instance-id: centos_1 
local-hostname: centos_1.local 
network-interfaces: | 
  iface eth0 inet static 
  address (An IP in your virtual bridge class) 
  network (The first IP of the virtual bridge class) 
  netmask (Your virtual bridge class netmask) 
  broadcast (Your virtual bridge class broadcast) 
  gateway (Your virtual bridge class gateway) 

To find your virtual bridge data, you have to look for a device that has the name virbrX or something similar, in my case it is virtbr0, so I can find all of its information using the following command:

$ ip addr show virbr0

The previous command will give this as an output:

5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:38:1a:e6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever

So, for me the meta-data file looks like the following:

instance-id: centos_1 
local-hostname: centos_1.local 
network-interfaces: | 
  iface eth0 inet static 
  address 192.168.124.10 
  network 192.168.124.1 
  netmask 255.255.255.0 
  broadcast 192.168.124.255 
  gateway 192.168.124.1 

This file will set up the eth0 interface of the virtual machine at boot time. We also need another file (user-data) to set up the users properly:

users: 
- name: (yourname) 
  shell: /bin/bash 
  sudo: ['ALL=(ALL) NOPASSWD:ALL'] 
  ssh-authorized-keys: 
  - (insert ssh public key here) 

For me, the file looks like the following:

users: 
- name: fale 
  shell: /bin/bash 
  sudo: ['ALL=(ALL) NOPASSWD:ALL'] 
  ssh-authorized-keys: 
  - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDRoZzfNif+wXFqzsmvHg4jJt8+ZO/dQxm5k7pXYAwdWVbiFrZYGhMQl5FPfzC7rkDaC31fod3Y85QkQVgNKCVYUy5QR5LfxUjSQDv+y2Nfao4be/BKla0ffc7JVSzFFAELGGDLn1lMN0e0D9syqQbKgSRdOdvweq/0Et3KNIF9e7XgEdSuAHls17NDtMkWUfyi5yvEtdtMcp9gO4OlG6Vh0iCXOdx+f0QA2hh1JnvePvzJ4a8CeckN5JwL7Q027nlsHPBYq9K1jvv+diUs48FflPJI4fgMq3Zo7zyCpf8qE7Dlx+u7OvR5kxNdrpnOsDgHeAGNkrzfcmxU7kbU29NX4VFgWd0sdlzu1nOWFEH7Cnd547tx5VFxBzJwEAUCh7QSiU2Ne/hCnjFkZuDZ5pN4pNw+yu+Feoz79gV/utoLHuCodYyAvSQlQ7VSfC+djLD/9wHC2yGksvc9ICnSUv3JyQEEEG4K26z6szF9+a3vU0qIq7YYa8QHgWIHtzSxztYRIWJOzTZlwyuNmhbRNYDaMC5BMzvQ8JREv0obMLmrlvolJPWT4gn1N9sDNNXIC6RDRE5yGsIEf0CliYW1X/8XG40U+g9LG+lrYOGWD4OymZ2P/VDIzZbVT6NG/rdSSGnf4D1AwlOGR7eNTv30AK9o0LVjqGaJWKWYUF9zY6I3+Q== 

To provide those files at boot time, we will need to create an ISO file containing them:

$ genisoimage -output centos_1.iso -volid cidata -joliet -rock user-data meta-data

After the ISO file is ready, we can instruct virt-install to actually create the virtual machine:

virt-install --name CentOS_1 \ 
--ram 2048 \ 
--disk centos_1.qcow2 \ 
--vcpus 2 \ 
--os-variant fedora21 \ 
--connect qemu:///system \ 
--network bridge:br0,model=virtio \ 
--cdrom centos_1.iso \ 
--boot hd 
virt-install --name CentOS_1 \ --ram 2048 \ --disk centos_1.qcow2 \ --vcpus 2 \ --os-variant fedora21 \ --connect qemu:///system \ --network bridge:br0,model=virtio \ --cdrom centos_1.iso \ --boot hd 

Since our network configuration is in the ISO file, we will need it at every boot. Sadly, by default this does not happen, so we will need to do a few more steps. Firstly, run virsh:

$ virsh

At this point, a virsh shell should appear with an output like the following:

Welcome to virsh, the virtualization interactive terminal.
Type:  'help' for help with commands
       'quit' to quit
virsh #

This means that we switched from bash (or your shell, if you are not using bash) to the virtualization shell. Issue the following command:

virsh # edit CentOS_1

By doing this we will be able to tweak the configuration of the CentOS_1 machine. In the disk section, you'll need to find the cdrom device that should look like this:

    <disk type='block' device='cdrom'> 
      <driver name='qemu' type='raw'/> 
      <target dev='hda' bus='ide'/> 
      <readonly/> 
      <address type='drive' controller='0' bus='0' target='0'
      unit='0'/> 
    </disk> 

You'll need to change it to the following as highlighted in bold:

    <disk type='file' device='cdrom'> 
      <driver name='qemu' type='raw'/> 
        <source file='(Put here your ISO path)/centos_1.iso'/> 
      <target dev='hda' bus='ide'/> 
      <readonly/> 
      <address type='drive' controller='0' bus='0' target='0'
      unit='0'/> 
    </disk> 

At this point, our virtual machine will always start with the ISO file mounted as a cdrom and therefore cloud-init will be able to correctly initiate the networking.

 

Version control system


In this chapter, we have already encountered the expression infrastructure code to describe the Ansible code that will create and maintain your infrastructure. We use the expression infrastructure code to distinguish it from the application code, which is the code that composes your applications, websites, and so on. This distinction is needed for clarity, but in the end, both types are a bunch of text files that the software will be able to read and interpret.

For this reason, a version control system will help you a lot. Its main advantages are:

  • Ability to have multiple people working simultaneously on the same project.

  • Ability to perform code reviews in a simple way.

  • Ability to have multiple branches for multiple environments (that is, dev, test, qa, staging, and production).

  • Ability to track a change so we know when it was introduced, and who introduced it. This makes it easier to understand why that piece of code is there, years (or months) later.

Those advantages are provided to you by the majority of version control systems out there.

Version control systems can be divided into three major groups based on the three different models that they can implement:

  • Local data model

  • Client-server model

  • Distributed model

The first category, the local data model, is the oldest (circa 1972) approach and is used for very specific use cases. This model requires all users to share the same filesystem. Famous examples of it are the Revision Control System (RCS) and Source Code Control System (SCCS).

The second category, the client-server model, arrived later (circa 1990) and tried to solve the limitations of the local data model, creating a server that respected the local data model and a set of clients that dealt with the server instead of with the repository itself. This additional layer allowed multiple developers to use local files and synchronize them with a centralized server. Famous examples of this approach are Apache Subversion (SVN), and Concurrent Versions System (CVS).

The third category, the distributed model, arrived at the beginning of the twenty-first century and tried to solve the limitations of the client-server model. In fact, in the client-server mode, you could work on the code offline, but you needed to be online to commit the changes. The distributed model allows you to handle everything on your local repository (like the local data model), and to merge different repositories on different machines in an easy way. In this new model, it's possible to perform all actions as in the client-server model, with the added benefits of being able to work completely offline as well as the ability to merge changes between peers without passing by the centralized server. Examples of this model are BitKeeper (proprietary software), Git, GNU Bazaar, and Mercurial.

There are some additional advantages that will be provided by only the distributed model, such as:

  • Possibility of making commits, browsing history, and performing any other action even if the server is not available

  • Easier management of multiple branches for different environments

When it comes to infrastructure code, we have to consider that, frequently, the infrastructure that retains and manages your infrastructure code is kept in the infrastructure code itself. This is a recursive situation that can create problems. In fact, until you have your code server in place you cannot deploy your Ansible, and until you have your Ansible in place, you cannot deploy your code server. A distributed version control system will prevent this problem.

As for the simplicity of managing multiple branches, even if this is not a hard rule, often distributed version control systems have much better merge handling than the other kinds of version control systems.

 

Using Ansible with Git


For the reasons that we have just seen and because of its huge popularity, I suggest always using Git for your Ansible repositories.

There are a few suggestions that I always provide to the people I talk to, so Ansible gets the best out of Git:

  • Create environment branches: Creating environment branches such as dev, prod, test, and stg, will allow you to easily keep track of the different environments and their respective update statuses. I often suggest keeping the master branch for the development environment, since I find many people are used to pushing new changes directly to the master. If you use a master for a production environment, people can inadvertently push changes in the production environment while they wanted to push them in a development environment.

  • Always keep environment branches stable: One of the big advantages of having environment branches is the possibility of destroying and recreating any environment from scratch at any given moment. This is only possible if your environment branches are in a stable (not broken) state.

  • Use feature branches: Using different branches for specific long-development features (such as a refactor or some other big changes) will allow you to keep your day-to-day operations while your new feature is in the Git repository (so you'll not lose track of who did what and when they did it).

  • Push often: I always suggest that people push commits as often as possible. This will make Git work as both a version control system and a backup system. I have seen laptops broken, lost, or stolen with days or weeks of unpushed work on them far too often. Don't waste your time, push often. Also, by pushing often, you'll detect merge conflicts sooner, and conflicts are always easier to handle when they are detected early, instead of waiting for multiple changes.

  • Always deploy after you have made a change: I have seen times when a developer has created a change in the infrastructure code, tested in the dev and test environments, pushed to the production branch, and then went to have lunch before deploying the changes in production. His lunch did not end well. One of his colleagues deployed the code to production inadvertently (he was trying to deploy a small change he had made in the meantime) and was not prepared to handle the other developer's deployment. The production infrastructure broke and they lost a lot of time figuring out how it was possible that such a small change (the one the person who made the deployment was aware of) created such a big mess.

  • Choose multiple small changes rather than a few huge changes: Making small changes, whenever possible, will make debugging easier. Debugging an infrastructure is not very easy. There is no compiler that will allow you to see obvious problems (even though Ansible performs a syntax check of your code, no other test is performed), and the tools for finding something that is broken are not always as good as you would imagine. The infrastructure as a code paradigm is new and tools are not yet as good as the ones for the application code.

  • Avoid binary files as much as possible: I always suggest keeping your binaries outside your Git repository, whether it is an application code repository or an infrastructure code repository. In the application code example, I think it is important to keep your repository light (Git as well as the majority of the version control systems, do not perform very well with binary blobs), while for the infrastructure code example, it is vital because you'll be tempted to put a huge number of binary blobs in it, since very often it is easier to put a binary blob in the repository than to find a cleaner (and better) solution.

 

Summary


In this chapter, we have seen what IT automation is, it's advantages, disadvantages, what kind of tools you can find, and how Ansible fits into this big picture. We have also seen how to install Ansible and how to create a KVM-based virtual machine. In the end, we analyzed the version control systems and spoke about the advantages Git brings to Ansible if used properly.

In the next chapter, we will start looking at the infrastructure code that we mentioned in this chapter without explaining exactly what it is and how to write it. Also in the next chapter, we'll see how to automate simple operations that you probably perform every single day, such as managing users, managing files, and file content.

About the Author

  • Fabio Alessandro Locati

    Fabio Alessandro Locati, commonly known as Fale, is a director at Otelia, a public speaker, an author, and an open source contributor. His main areas of expertise are Linux, automation, security, and cloud technologies. Fale has more than 12 years of working experience in IT, with many of them spent consulting for many companies, including dozens of Fortune 500 companies. This has allowed him to consider technologies from different points of view, and to think critically about them.

    Browse publications by this author

Latest Reviews

(2 reviews total)
.............................
........ ...... ..... ................ ........... .
Book Title
Access this book, plus 7,500 other titles for FREE
Access now