Imagine you're sitting at home one day after a long day of work. Suddenly, you get a phone call that a new security vulnerability was found and all 300 of your servers will need to be patched. How would you handle it?
With Puppet, finding which one of your servers was vulnerable would be an easier task than doing so by hand. Furthermore, with a little additional work, you could ensure that every one of your servers is running a newer nonvulnerable version of the Puppet package.
In this chapter, we will touch on the following concepts:
What is Puppet?
Declarative versus imperative systems
The Puppet client-server model
Other components of the Puppet ecosystem used for security
Installing Puppet
How Puppet fits into a security role
Once this is complete, we will build the environment we'll use to run examples in this book and then run our first example.
Much of the information in this chapter is presented as a guide to what we will accomplish later on in this book.
The Puppet Labs website describes open source Puppet as follows:
Open source Puppet is a configuration management system that allows you to define the state of your IT infrastructure, then automatically enforces the correct state.
What does this mean, though?
Puppet is a configuration management tool. A configuration management tool is a tool that helps the user specify how to put a computer system in a desired state. Other popular tools that are considered as configuration management tools are Chef and CFEngine. There are also a variety of other options that are gaining a user base, such as Bcfg2 and Salt.
Chef is another configuration management tool. It uses pure Ruby Domain-specific Language (DSL) similar to Puppet. We'll cover what a domain-specific language is shortly. This difference allows you to write the desired state of your systems in Ruby. Doing so allows one to use the features of the Ruby language, such as iteration, to solve some problems that can be more difficult to solve in the stricter domain-specific language of Puppet. However, it also requires you to be familiar with Ruby programming. More information on Chef can be found at http://www.getchef.com.
CFEngine is the oldest of the three main tools mentioned here. It has grown into a very mature platform as it has expanded. Puppet was created out of some frustrations with CFEngine. One example of this is that the CFEngine community was formally quite closed, that is, they didn't accept user input on design decisions. Additionally, there was a focus in CFEngine on the methods used to configure systems. Puppet aimed to be a more open system that was community-focused. It also aimed to make the resource the primary actor, and relied on the engine to make necessary changes instead of relying on scripts in most cases.
Note
Many of these issues were addressed in CFEngine 3, and it retains a very large user base. More information on CFEngine can be found at http://www.cfengine.com.
Bcfg2 and Salt are both tools that are gaining a user base. Both written in Python, they provide another option for a user who may be more familiar with Python than other languages. Information on these tools, as well as a list of others that are available, can be found at https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software.
Configuration management tools were brought about by a desire to make system administration work repeatable, as well as automate it.
In the early days of system administration, it was very common for an administrator to install the operating system needed as well as install any necessary software packages. When systems were simple and few in number, this was a low effort way of managing them.
As systems grew more complex and greater numbers of them were installed, this became much more difficult. Troubleshooting an application as it began to run on multiple systems also became difficult. The difference in software versions on installed nodes and other configuration differences created inconsistencies in the behavior of multiple systems that were running the same application. Installation manuals, run books, and other forms of documentation were often deployed to try to remedy this, but it was clear that we needed a better way.
As time moved on, system administrators realized that they needed a better way to manage their systems. A variety of methods were born, but many of them were home built. They often used SSH to manage remote hosts. I also built several such systems at various places before coming across Puppet.
Puppet sought to ease the pain and shortcomings of the early days. It was a big change from anything that was present at the time. A large part of this was because of its declarative nature.
At the core of Puppet is software that allows you to specify the state of the system and let Puppet get the system there. It differs from many of the other products in the configuration management space due to its declarative nature.
In a declarative system, we model the desired state of the resources (things being managed).
Declarative systems have the following properties:
Desired state is expressed, not steps used to get there
Usually no flow control, such as loops; it may contain conditional statements
Actions are normally idempotent
Dependency is usually explicitly declared
Tip
The concept of actions being idempotent is a very important one in Puppet. It means that actions can be repeated without causing unnecessary side effects. For example, removing a user is idempotent, because removing it when it doesn't exist causes no side effects. Running a script that increments to the next user ID and creates a user may not be idempotent, because the user ID might change.
Imperative systems, on the other hand, use algorithms and steps to express their desired state. Most traditional programming languages, such as C and Java, are considered imperative. Imperative systems have the following properties:
They use algorithms to describe the steps to the solution
They use flow control to add conditionals and loops
Actions may not be idempotent
Dependency is normally executed by ordering
In Puppet, which is declarative, the users can describe how they want the system to look in the end, and leave the implementation details of how to get there up to the types and providers within Puppet. Puppet uses types, which represent resources, such as files or packages. Each type can optionally be implemented by one or more provider.
Types provide the core functionality available in Puppet. The type system is extensible, and additional types can be added using pure Ruby code. Later on in this chapter, we'll use the file and package types in our example.
Providers include the code for the type that actually does the low level implementation of a resource. Many types have several providers that implement their functionality in different ways. An example of this is the package type. It has providers for RPM, Yum, dpkg, Windows using MSI, and several others. While it is not a requirement that all types have multiple providers, it is not uncommon to see them, especially for resources that have different implementation details across operating systems.
This system of types and providers isolates the user from having to have specific knowledge of how a given task is done. This allows them to focus on how the system should be configured, and leave specific implementation details, such as how to put it in that state, to Puppet.
A few tools, such as Chef, actually use more of a hybrid approach. They can be used in a declarative state, but also allow the use of loops and other flow control structures that are imperative. Puppet is slowly starting to gain some support for this in their new future parser, however these are experimental and advanced features at this point.
While the declarative approach may have a larger learning curve, especially around dependency management, many sysadmins find it a much better fit with their way of thinking once they learn how it works.
Puppet uses a client-server model in the most common configurations. In this mode, one or more systems, called Puppet Masters, contain files called manifests. Manifests are code written in the Puppet DSL. A DSL is a language designed to be used for a specific application. In this case, the language is used to describe the desired state of a system. This differs from more general purpose languages, such as C and Ruby, in that it contains specialized constructs for the problem being solved. In this case, the resources in the language are specific to the configuration management domain.
Manifests contain the classes and resources which Puppet uses to describe the state of the system. They also contain declarations of the dependencies between these resources.
Classes are often bundled up into modules which package up classes into reusable chunks that can be managed separately. As your system becomes more complicated, using modules helps you manage each subsystem independently of the others.
The client systems contain the Puppet agent, which is the component that communicates with the master. At specified run intervals (30 minutes by default), the agent will run and the following actions will take place:
Custom plugins, such as facts, types, and providers, are sent to the client, if configured.
The client collects facts and sends them to the master.
The master compiles a catalog and sends it to the client.
The client processes the catalog sent by the master.
The client sends the reporting data to the master, if configured.
The catalog, sent to the client by the master, contains a compiled state of the system resources of the client. The client then applies this information using types and providers to bring the system into the desired state. The following illustration shows how data flows between the components:

It is also possible to run Puppet in a masterless mode. In this mode, the Puppet manifests and other needed components, such as custom facts, types, and providers, are distributed to each system using an out of band method, such as scp
or rsync
. Puppet is then applied on the local node using cron
or some other tool.
cron
has the advantage of not requiring the server setup with open ports that the master-based setup has. In some organizations, this makes it easier to get past information security teams. However, many of the reporting and other benefits we will explore in this book are less effective when run in this fashion. The book Puppet 3: Beginners Guide, John Arundel, Packt Publishing, has a good amount of information about such a masterless setup.
Puppet has a number of other components that form part of the Puppet ecosystem, which are worth exploring due to their use as security tools. The specific components we are going to explore here include PuppetDB and Hiera.
PuppetDB is an application used to store information on the Puppet infrastructure. Released in 2012, PuppetDB solved performance issues present in the older storeconfigs
method that stored information about Puppet runs.
PuppetDB allows you to store facts, catalogs, reports, and resource information (via exported resources). Mining this data, using one of the reporting APIs, is an easy and powerful way to get a view of your infrastructure. More information on PuppetDB will be presented in Chapter 3, Puppet for Compliance, as well as Chapter 4, Security Reporting with Puppet.
Hiera was a new feature introduced in Puppet 3. It is a hierarchal data store, which helps to keep information about your environment. This allows you to separate data about the environment from code that acts on the environment. By doing so, you can apply separate security policies to the code that drives the environment and data about the systems.
Before Hiera, it was not uncommon to see large sections of Puppet code dedicated to maintaining sites or installation of specific information on the systems under management. This area was often difficult to maintain if the ability to override parameters using many different factors was needed.
By adding a hierarchy that can depend on any facts, it becomes much easier to store the data needed for the systems under management. A model of most specific to least specific can then be applied, which makes it much easier to override the default data at a site, environment, or system level.
For example, let's say you had a set of development environments where a certain group of development accounts needed to get created, and SSH access to those accounts was granted. However, these accounts and the access granted should only exist in the development machines, and not in production. Without Hiera, there would likely be site-specific information in the modules to manage the SSH configuration, and perhaps in the user creation module to manage the users. Using Hiera, we can add a fact for the type of system (production or development) and store which users get created there, or have access. This moves the list of users with access to the system out of the code itself, and into a data file.
As our examples get more complicated later in this book, we will explore using Hiera to store some system data.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Puppet can be installed in a variety of ways. Since this book is focused on the security-related aspects of Puppet and is not a beginner's guide, we will cover the most common way it is installed on our target system. There are many good reference books available for more in-depth information on installing Puppet, including Puppet 3: Beginner's Guide, John Arundel, Packt Publishing.
In our examples, we'll be using CentOS 6 as our operating system. If you are using a different operating system and following along on your own, please see the installation instructions for your operating system at http://www.puppetlabs.com, or follow along using Vagrant as outlined later.
Since we will be using Vagrant for our examples, the base box we are using already has the Puppet repository installed on it as well as the Puppet agent. We'll provide instructions for the installation of these elements for those who wish to use CentOS without using Vagrant.
The currently recommended way to install Puppet on CentOS machines is to use the Puppet Labs Yum repository. This repository, which can be found at https://yum.puppetlabs.com, contains all the Puppet Labs software as well as the dependencies required to install them, such as several Ruby gems not present in the main CentOS repository. On installation, Ruby and these dependencies will also be installed.
Adding this repository is relatively simple. Execute the following command as a root (or using sudo
, as shown here):
After running this command, you will see an output similar to this:
Once this is complete, you're done! The Puppet Labs repository is added and we can use it to install the current version of any of the Puppet Labs products.
The next step is to install the Puppet Master. As mentioned earlier, this system acts as the controller that all of your client agents will then use to communicate with to receive catalog information. This package is normally installed on only a few systems that act as servers for configuration management information.
Installing the master with the repository is as easy as executing the following command:
This will instruct yum
to install the Puppet server without confirmation. The output will be as follows:

On all the systems that we wish to manage by using Puppet, we'll need to install the Puppet agent. This agent is a piece of software that is responsible for communicating with the master and applying changes.
Installing the Puppet agent is very easy and similar to installing the master in the preceding section. You simply run the following:
After this is complete, you'll see that the the Puppet agent is installed on the local machine and is ready to talk to the master.
Now that we have a perfectly working Puppet Master, we need to configure it. Installation of the packages will include a base level configuration. There are some changes we will want to make to the base Puppet configuration to enable some features that we'll use in the future. As we go through this book, we'll make changes to these files several times.
The main configuration files in use by Puppet are present in the /etc/puppet
directory.
In this directory, there are a number of configuration files that control how Puppet behaves. Information on these files can be found at https://docs.puppetlabs.com/puppet/3.7/reference/config_about_settings.html. For now, we only need to concern ourselves with the Puppet configuration file.
Open the /etc/puppet/puppet.conf
file with your favorite editor (make sure that you use sudo
) and edit it to look similar to the following:
We've made a handful of changes to the file from the default version and will cover them here.
The first change is adding the report = true
section to the agent configuration section. This will cause clients to send reports containing information about the Puppet run. We'll use these reports for later analysis in Chapter 4, Security Reporting with Puppet.
The second change is to add pluginsync = true
to the agent section. While this has become the default in the more recent versions of Puppet, it does not hurt to add it in. This causes the clients to sync custom facts, providers, and other Puppet libraries from the master. We will see how this is used in later chapters.
The final change we have made is to add the master section and add reports = store
. This causes the master to save reports to the local filesystem on the Puppet Master. We'll use this later to do analysis of our Puppet runs for security-related purposes.
Both the Puppet Master and the agent are usually run as services. This allows the agent to check its run frequency and apply any changes. We've not explicitly started the services here, although we'll need to start the master in order to use it from our agent. To do this, we run the following command:
In order for the Puppet Master to start at boot, we'll also issue the following command to enable it to autostart:
It's pretty common to use Puppet to manage Puppet, and in a later chapter, we'll do this to show how we can use Puppet to secure the Puppet Master.
Note
It's worth noting that Puppet running with a default web server configuration will not scale beyond a few dozen hosts. Scaling Puppet is outside the scope of this book. More information on scaling Puppet can be found at http://docs.puppetlabs.com/guides/scaling.html.