In this article by Madhurranjan Mohan and Ramesh Raithatha, the authors of the book, Learning Ansible, have given an overview of the basic features of Ansible, right from the installation part till the deployment.
(For more resources related to this topic, see here.)
What is Ansible?
Ansible is an orchestration engine in IT, which can be used for several use cases, such as configuration management, orchestration, provisioning, and deployment. Compared to other automation tools, Ansible brings you an easy way to configure your orchestration engine without the overhead of a client or central server setup. That's right! No central server! It comes preloaded with a wide range of modules that make your life simpler.
Ansible is an open source tool (with enterprise editions available) developed using Python and runs on Windows, Mac, and Unix-like systems. You can use Ansible for configuration management, orchestration, provisioning, and deployments, which covers many of the problems that are solved under the broad umbrella of DevOps. We won't be talking about culture here as that's a book by itself!
You could refer to the book, Continuous Delivery and DevOps – A Quickstart Guide by Packt Publishing for more information at https://www.packtpub.com/virtualization-and-cloud/continuous-delivery-and-devops-quickstart-guide.
Let's try to answer some questions that you may have right away.
Can I use Ansible if I am starting afresh, have no automation in my system, and would like to introduce that (and as a result, increase my bonus for the next year)?
A short answer to this question is Ansible is probably perfect for you. The learning curve with Ansible is way shorter than most other tools currently present in the market.
I have other tools in my environment. Can I still use Ansible?
Yes, again! If you already have other tools in your environment, you can still augment those with Ansible as it solves many problems in an elegant way. A case in point is a puppet shop that uses Ansible for orchestration and provisioning of new systems but continues to use Puppet for configuration management.
I don't have Python in my environment and introducing Ansible would bring in Python. What do I do?
You need to remember that, on most Linux systems, a version of Python is present at boot time, and you don't have to explicitly install Python. You should still go ahead with Ansible if it solves particular problems for you. Always question what problems you are trying to solve and then check whether a tool such as Ansible would solve that use case.
I have no configuration management at present. Can I start today?
The answer is yes!
In many of the conferences we presented, the preceding four questions popped up most frequently. Now that these questions are answered, let's dig deeper.
The architecture of Ansible is agentless. Yes, you heard that right; you don't have to install any client-side software. It works purely on SSH connections, in which case you can consider SSH as your agent and our previous statement that Ansible is agentless is not entirely right. However, SSH is almost assumed to run on every server that its consider as a separate agent. Hence, they call Ansible agentless. So, if you have a well-oiled SSH setup, then you're ready to roll Ansible into your environment in no time. This also means that you can install it only on one system (either a Linux or Mac machine) and you can control your entire infrastructure from that machine.
Yes, we understand that you must be thinking about what happens if this machine goes down. You would probably have multiple such machines in production, but this was just an example to elucidate the simplicity of Ansible.
As Ansible works on SSH connections, it would be slower; in order to speedup default SSH connections, you can always enable ControlPersist and the pipeline mode, which makes Ansible faster and secure. Ansible works like any other Unix command that doesn't require any daemon process to be running all the time.
So you would either run it via a cron, on demand from a single node, or at startup. Ansible can be push or pull based and you can utilize whatever suits you.
When you start with something new, the first aspect you need to pay attention to is the nomenclature. The faster you're able to pick up the terms associated with the tool, the faster you're comfortable with that tool. So, to deploy, let's say, a package on one or more machines in Ansible, you would need to write a playbook that has a single task, which in turn uses the package module that would then go ahead and install the package based on an inventory file that contains a list of these machines. If you feel overwhelmed by the nomenclature, don't worry, you'll soon get used to it. Similar to the package module, Ansible comes loaded with more than 200 modules, purely written in Python. We will talk about modules in detail later.
It is now time to install Ansible to start trying out various fun examples.
Installing Ansible
Installing Ansible is rather quick and simple. You can directly use the source code by cloning it from the GitHub project (https://github.com/ansible/ansible), install it using your system's package manager, or use Python's package management tool (pip). You can use Ansible on any Windows, Mac, or Unix-like system. Ansible doesn't require any database and doesn't need any daemons running. This makes it easier to maintain the Ansible versions and upgrade without any breaks.
We'd like to call the machine where we will install Ansible our command center. Many people also refer to it as the Ansible workstation.
Note that, as Ansible is developed using Python, you would need Python Version 2.4 or a higher version installed. Python is preinstalled, as specified earlier, on the majority of operating systems. If this is not the case for you, refer to https://wiki.python.org/moin/BeginnersGuide/Download to download/upgrade Python.
Installing Ansible from source
Installing from source is as easy as cloning a repository. You don't require any root permissions while installing from source. Let's clone a repository and activate virtualenv, which is an isolated environment in Python where you can install packages without interfering with the system's Python packages. The command is as follows:
$ git clone git://github.com/ansible/ansible.git
$ cd ansible/
$ source ./hacking/env-setup
Ansible needs a couple of Python packages, which you can install using pip. If you don't have pip installed on your system, install it using the following command:
$ sudo easy_install pip
Once you have installed pip, install the paramiko, PyYAML, jinja2, and httplib2 packages using the following command lines:
$ sudo pip install paramiko PyYAML jinja2 httplib2
By default, Ansible will be running against the development branch. You might want to check out the latest stable branch. Check what the latest stable version is using the following command line:
$ git branch -a
Copy the latest version you want to use. Version 1.7.1 was the latest version available at the time of writing. Check the latest version you would like to use using the following command lines:
$ git checkout release1.7.1
You now have a working setup of Ansible ready. One of the benefits of running Ansible through source is that you can enjoy the benefits of new features immediately without waiting for your package manager to make them available for you.
Installing Ansible using the system's package manager
Ansible also provides a way to install itself using the system's package manager. We will look into installing Ansible via Yum, Apt, Homebrew, and pip.
Installing via Yum
If you are running a Fedora system, you can install Ansible directly. For CentOS- or RHEL-based systems, you should add the EPEL repository first, as follows:
$ sudo yum install ansible
On CentOS 6 or RHEL 6, you have to run the command rpm -Uvh. Refer to http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm for instructions on how to install EPEL.
You can also install Ansible from an RPM file. You need to use the make rpm command against the git checkout of Ansible, as follows:
$ git clone git://github.com/ansible/ansible.git
$ cd ./ansible
$ make rpm
$ sudo rpm -Uvh ~/rpmbuild/ansible-*.noarch.rpm
You should have rpm-build, make, and python2-devel installed on your system to build an rpm.
Installing via Apt
Ansible is available for Ubuntu in a Personal Package Archive (PPA). To configure the PPA and install Ansible on your Ubuntu system, use the following command lines:
$ sudo apt-get install apt-add-repository
$ sudo apt-add-repository ppa:rquillo/ansible
$ sudo apt-get update
$ sudo apt-get install ansible
You can also compile a deb file for Debian and Ubuntu systems, using the following command line:
$ make deb
Installing via Homebrew
You can install Ansible on Mac OSX using Homebrew, as follows:
$ brew update
$ brew install ansible
Installing via pip
You can install Ansible via Python's package manager pip. If you don't have pip installed on your system, install it. You can use pip to install Ansible on Windows too, using the following command line:
$ sudo easy_install pip
You can now install Ansible using pip, as follows:
$ sudo pip install ansible
Once you're done installing Ansible, run ansible --version to verify that it has been installed:
$ ansible –version
You will get the following as the output of the preceding command line:
ansible 1.7.1
Hello Ansible
Let's start by checking if two remote machines are reachable; in other words, let's start by pinging two machines following which we'll echo hello ansible on the two remote machines. The following are the steps that need to be performed:
Create an Ansible inventory file. This can contain one or more groups. Each group is defined within square brackets. This example has one group called servers:
$ cat inventory
[servers]
machine1
machine2
Now, we have to ping the two machines. In order to do that, first run ansible --help to view the available options, as shown below (only copying the subset that we need for this example):
ansible --help
Usage: ansible <host-pattern> [options]
Options:
-a MODULE_ARGS, --args=MODULE_ARGS
module arguments
-i INVENTORY, --inventory-file=INVENTORY
specify inventory host file
(default=/etc/ansible/hosts)
-m MODULE_NAME, --module-name=MODULE_NAME
module name to execute
(default=command)
We'll now ping the two servers using the Ansible command line, as shown in the following screenshot:
Now that we can ping these two servers, let's echo hello ansible!, using the command line shown in the following screenshot:
Consider the following command:
$ansible servers -i inventory -a '/bin/echo hello ansible!'
The preceding command line is the same as the following one:
$ansible servers -i inventory -m command -a '/bin/echo hello ansible!'.
If you move the inventory file to /etc/ansible/hosts, the Ansible command will become even simpler, as follows:
$ ansible servers -a '/bin/echo hello ansible!'
There you go. The 'Hello Ansible' program works! Time to tweet!
You can also specify the inventory file by exporting it in a variable named ANSIBLE_HOSTS. The preceding command, without the –i option, will work even in that situation.
Developing a playbook
In Ansible, except for ad hoc tasks that are run using the ansible command, we need to make sure we have playbooks for every other repeatable task. In order to do that, it is important to have a local development environment, especially when a larger team is involved, where people can develop their playbooks and test them before checking them into Git.
A very popular tool that currently fits this bill is Vagrant. Vagrant's aim is to help users create and configure lightweight, reproducible, and portable development environments. By default, Vagrant works on VirtualBox, which can run on a local laptop or desktop. To elaborate further, it can be used for the following use cases:
Vagrant can be used when creating development environments to constantly check new builds of software, especially when you have several other dependent components.
For example, if I am developing service A and it depends on two other services, B and C, and also a database, then the fastest way to test the service locally is to make sure the dependent services and the database are set up (especially if you're testing multiple versions), and every time you compile the service locally, you deploy the module against these services and test them out.
Testing teams can use Vagrant to deploy the versions of code they want to test and work with them, and each person in the testing team can have local environments on their laptop or desktop rather than common environments that are shared between teams.
If your software is developed for cloud-based platforms and needs to be deployed on AWS and Rackspace (for example), apart from testing it locally on VMware Fusion or VirtualBox, Vagrant is perfect for this purpose. In Vagrant's terms, you can deploy your software on multiple providers with a configuration file that differs only for a provider.
For example, the following screenshot shows the VirtualBox configuration for a simple machine:
The following is the AWS configuration for a simple machine:
As you can see, the provider configuration changes but the rest of the configuration remains more or less the same. (Private IP is virtual-box-specific but it is ignored when run using the AWS provider.)
Vagrant also provides provisioners. Vagrant provides users with multiple options to configure new machines as they come up using provisioners. They support shell scripts, tools such as Chef, Puppet, Salt, and Docker, as well as Ansible.
By using Ansible with Vagrant, you can develop your Ansible scripts locally, deploy, and redeploy them as many times as needed to get them right, and then check them in. The advantage, from an infrastructure point of view, is that the same code can also be used by other developers and testers when they spawn off their vagrant environments for testing (The software would be configured to work in the expected manner by Ansible playbooks.). The checked-in Ansible code will then flow like the rest of your software, through testing and stage environments, before they are finally deployed into Production.
Roles
When you start thinking about your infrastructure, you will soon look at the purposes each node in your infrastructure is solving and you will be able to categorize them. You will also start to abstract out information regarding nodes and start thinking at a higher level. For example, if you're running a web application, you'll be able to categorize them broadly as db_servers, app_servers, web servers, and load balancers.
If you then talk to your provisioning team, they will tell you which base packages need to be installed on each machine, either for the sake of compliance or to manage them remotely after choosing the OS distribution or for security purposes. Simple examples can be packages such as bind, ntp, collectd, psacct, and so on. Soon you will add all these packages under a category named common or base. As you dig deeper, you might find further dependencies that exist. For example, if your application is written in Java, having some version of JDK is a dependency. So, for what we've discussed so far, we have the following categories:
db_servers
app_servers
web_servers
load_balancers
common
jdk
We've taken a top-down approach to come up with the categories listed. Now, depending on the size of your infrastructure, you will slowly start identifying reusable components, and these can be as simple as ntp or collectd. These categories, in Ansible's terms, are called Roles. If you're familiar with Chef, the concept is very similar.
Callback plugins
One of the features that Ansible provides is a callback mechanism. You can configure as many callback plugins as required. These plugins can intercept events and trigger certain actions. Let's see a simple example where we just print the run results at the end of the playbook run as part of a callback and then take a brief look at how to configure a callback. We first use grep for the location of callback_plugins in ansible.cfg as follows:
$ grep callback ansible.cfg
callback_plugins = /usr/share/ansible_plugins/callback_plugins
We then create our callback plugin in this location.
$ ls -l /usr/share/ansible_plugins/callback_plugins
callback_sample.py
Let's now look at the contents of the callback_sample file:
$ cat /usr/share/ansible_plugins/callback_plugins/callback_sample.py
class CallbackModule(object):
def on_any(self, *args, **kwargs):
pass
def runner_on_failed(self, host, res, ignore_errors=False):
pass
def runner_on_ok(self, host, res):
pass
def runner_on_error(self, host, msg):
pass
def runner_on_skipped(self, host, item=None):
pass
def runner_on_unreachable(self, host, res):
pass
def runner_on_no_hosts(self):
pass
def runner_on_async_poll(self, host, res, jid, clock):
pass
def runner_on_async_ok(self, host, res, jid):
pass
def runner_on_async_failed(self, host, res, jid):
pass
def playbook_on_start(self):
pass
def playbook_on_notify(self, host, handler):
pass
def playbook_on_no_hosts_matched(self):
pass
def playbook_on_no_hosts_remaining(self):
pass
def playbook_on_task_start(self, name, is_conditional):
pass
def playbook_on_vars_prompt(self, varname, private=True, prompt=None, encrypt=None, confirm=False,
salt_size=None, salt=None, default=None):
pass
def playbook_on_setup(self):
pass
def playbook_on_import_for_host(self, host, imported_file):
pass
def playbook_on_not_import_for_host(self, host, missing_file):
pass
def playbook_on_play_start(self, pattern):
pass
def playbook_on_stats(self, stats):
results = dict([(h, stats.summarize(h)) for h in stats.processed])
print "Run Results: %s" % results
As you can see, the callback class, CallbackModule, contains several methods. The methods of this class are called and the Ansible run parameters are provided as parameters to these methods. Playbook activities can be intercepted by using these methods and relevant actions can be taken based on that. Relevant methods are called based on the action, for example, we've used the playbook_on_stats method (in bold) to display statistics regarding the playbook run. Let's run a basic playbook with the callback plugin and view the output as follows:
In the preceding screenshot, you can now see the Run Results line right at the end consisting of a dictionary or hash of the actual results. This is due to our custom code. This is just an example of how you can intercept methods and use them to your advantage.
You can utilize this information in any number of ways. Isn’t this powerful? Are you getting any ideas around how you can utilize a feature such as this? Do write it down before reading further. If you’re able to write out even two use cases that we’ve not covered here and is relevant to your infrastructure, give yourself a pat on the back!
Modules
Ansible allows you to extend functionality using custom modules. Arguments , as you have seen can be passed to the modules. The arguments that you pass, provided they are in a key value format, will be forwarded in a separate file along with the module. Ansible expects at least two variables in your module output, that is, result of the module run, whether it passed or failed, and a message for the user and they have to be in JSON format. If you adhere to this simple rule, you can customize as much as you want and the module can be written in any language.
Using Bash modules
Bash modules in Ansible are no different than any other bash scripts, except the way it prints the data on stdout. Bash modules could be as simple as checking if a process is running on the remote host to running some complex commands.
We recommend that you use bash over other languages, such as Python and Ruby only when you're performing simple tasks. In other cases, you should use languages that provide better error handling.
Let's see an example for the bash module as follows:
The preceding bash module will take the service_name argument and forcefully kill all of the Java processes that belong to that service. As you know, Ansible passes the argument file to the module. We then source the arguments file using source $1. This will actually set the environment variable with the name, service_name. We then access this variable using $service_name as follows:
We then check to see if we obtained any PIDs for the service and run a loop over it to forcefully kill all of the Java processes that match service_name. Once they're killed, we exit the module with failed=False and a message with an exit code of 0, as shown in the following screenshot, because terminating the Ansible run might not make sense:
Provisioning a machine in the cloud
With that, let's jump to the first topic. Teams managing infrastructures have a lot of choices today to run their builds, tests, and deployments. Providers such as Amazon, Rackspace, and DigitalOcean primarily provide Infrastructure as a Service (IAAS). They expose an API via SDKs, which you can invoke in order to create new machines, or use their GUI to set it up. We're more interested in using their SDK as it will play an important part in our automation effort. Setting up new servers and provisioning them is interesting at first but at some stage it can become boring as it's quite repetitive in nature. Each provisioning step would involve several similar steps to get them up-and-running.
Imagine one fine morning you receive an e-mail asking for three new customer setups, where each customer setup has three to four instances and a bunch of services and dependencies. This might be an easy task for you, but would require running the same set of repetitive commands multiple times, followed by monitoring the servers once they come up to confirm that everything just went fine. In addition, anything you do manually has a chance of introducing bugs.
What if two of the customer setups come up correctly but, due to fatigue, you miss out a step for the third customer and hence introduce a bug? To deal with such situations, there exists automation. Cloud provisioning automation makes it easy for an engineer to build up a new server as quickly as possible, allowing him/her to concentrate on other priorities. Using Ansible, you can easily perform these actions and automate cloud provisioning with minimal effort. Ansible provides you with the power to automate various different cloud platforms, such as Amazon, DigitalOcean, Google Cloud, Rackspace, and so on, with modules for different services available in the Ansible core.
Docker provisioning
Docker is perhaps the most popular open source tool that has been released in the last year. The following quote can be seen on the Docker website:
Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud.
Increasingly, more and more individuals and companies are adopting Docker. The tagline for Docker is Containerization is the new virtualization. At a high level, all Docker allows you to do is prepare lightweight containers using instructions from a Dockerfile and run the container. The same container can be shared or shipped across environments, thereby making sure you run the exact same image and reducing the chance of errors. The Docker image that you build is cached by default; thus, the next time you have similar instructions, the time taken to bring up a similar container is reduced to almost nothing.
Let's now look at how Ansible can be used with Docker to make this a powerful working combination. You can use Ansible with Docker to perform the following:
Installing Docker on hosts
Deploying new Docker images
Building or provisioning new Docker images
Deployment strategies with RPM
In most cases, we already have a certain version of the application that has been deployed and now, either to introduce a new feature or fix bugs, a new version has to be deployed. We'd like to discuss this scenario in greater detail.
At a high level, whenever we deploy an application, there are three kinds of changes to take care of:
Code changes
Config changes
Database changes
The first two types are ideally handled by the RPM, unless you have very specific values that need to be set up during runtime. Files with passwords can be checked but they should be encrypted with Ansible Vault or dropped into files as templates during run time, just as we did with database.yml.
With templates, if the configs are ideally just name-value pairs that can be handled in a Jinja template, you should be good, but if you have other lines in the configuration that do not change, then it's better that those configuration files are checked and appear as part of the RPM. Many teams we've worked with check environment-specific folders that have all the configuration parameters; while starting the application, we provide the environment in which the application should be started.
Another way is to deploy the RPM with default values for all configuration properties while writing a different folder with name-value pairs that override the parameters in the default configuration that is part of the RPM.
The database changes should be handled carefully. Ideally, you want them to be idempotent for a particular version of the RPM so that, even if someone tries to push database changes multiple times, the database is not really affected.
For example, in the preceding case, we run rake db:migrate that is idempotent in nature; even if you run the same command from multiple machines, you shouldn't really face issues. The Rails framework does it by storing the database migration version in a separate table.
Having looked at the three types of changes, we can now examine how to perform rpm deployments for each release. When you're pushing new changes, the current version or service is already running. It's recommended that you take the server out of service before performing upgrades. For example, if it's part of a load balancer, make sure it's out of the load balancer and not serving any traffic before performing the upgrades. Primarily, there are the following two ways:
Deploying newer versions of rpm in the same directory
Deploying the rpm into a version-specific directory
Canary deployment
The name Canary is used with reference to the canaries that were often sent in to coal mines to alert miners about toxic gases reaching dangerous levels. Canary deployment is a technique that is used to reduce the risk of releasing a new version of software by first releasing it to a small subset of users (demographically, location-wise, and so on), gauging their reaction, and then slowly releasing it to the rest of the users.
Whenever possible, keep the first set of users as internal users, since it reduces the impact on the actual customers. This is especially useful when you introduce a new feature that you want to test with a small subset of users before releasing it to the rest of your customer base. If you're running, let's say, an application across multiple data centers and you're sure that certain users would only contact specific data centers when they access your site, you could definitely run a Canary deployment.
Deploying Ansible pull
The last topic we'd like to cover in this section is Ansible pull. If you have a large number of hosts that you'd like to release software on simultaneously, you will be limited by the number of parallel SSH connections that can be run. At scale, the pull model is preferred to the push model. Ansible supports what is called as Ansible pull. Ansible pull works individually on each node. The prerequisite is that it points to a repository from where it can invoke a special file called localhost.yml or <hostname>.yml. Typically, the ansible-pull option is run either as a cron job or is triggered remotely by some other means.
We're going to use our tomcat example again, with the only difference being that the structure of the repository has been changed slightly. Let's look at the structure of the git repository that will work for Ansible pull as follows:
As you can see, localhost.yml is present at the top level and the roles folder consists of the tomcat folder, under which is the tasks folder with the main.yml task file. Let's now run the playbook using ansible-pull as follows:
Let's look at the preceding run in detail as follows:
The ansible-pull command: We invoke ansible-pull with the following options:
–o: This option means that the Ansible run takes place only if the remote repository has changes.
–C master: This option indicates which branch to refer to in the git repository.
–U < >: This option indicates the repository that needs to be checked out.
–i localhost: This option indicates the inventory that needs to be considered. In this case, since we're only concerned about one tomcat group, we use -i localhost. However, when there are many more inventory groups, make sure you use an inventory file with the –i option.
The localhost | success JSON: This option checks whether the repository has changed and lists the latest commit.
The actual Ansible playbook run: The Ansible playbook run is just like before. At the end of the run, we will have the WAR file up and running.
Summary
In this article, we got an overview of Ansible, we looked at the introduction to Ansible, how to install Ansible in various ways, wrote our very first playbook, learned ways to use the Ansible command line, how to debug playbooks, and learned how to develop a playbook on our own.
We also looked into the various aspects of Ansible such as Roles, callback plugins, modules and the Bash module, how to provision a machine in the cloud, Docker provisioning, the deployments strategies with RPM, Canary deployment and deployment using Ansible pull.
Resources for Article:
Further resources on this subject:
Getting Started with Ansible [Article]
Module, Facts, Types and Reporting tools in Puppet [Article]
Designing Puppet Architectures [Article]
Read more