Some tools can be enormously tedious to set up for reporting, normally making you wade through many different configuration files, wrestle with obscure settings, and make you lose the will to live, generally. Fortunately, Puppet is a sensible product when it comes to its initial configuration; out of the box, it will take very little tweaking to get it to report to the Puppet master. This is not to say that there aren't plenty of options to keep power users happy, it's just that you generally do not need to use them.
In this chapter, we're going to cover the following topics:
An introduction to how Puppet reporting works
A brief tour of the Puppet
Configuring a Puppet client
Configuring a Puppet master
Before we get into the nitty-gritty of configuring our Puppet installation, it's worth briefly going over the basics of how Puppet goes about its reporting. At its heart, a Puppet master is a web server and the reporting mechanism reflects this; a Puppet agent performs a simple HTTPS PUT operation to place the reporting information onto a Puppet master. When configured properly, the Puppet master will receive reports from Puppet agents, each and every time they perform a Puppet run, either in the noop or apply mode. Once the reports have been received, we can go ahead and do some fairly fantastic things with the data using a variety of methods to transform, transport, and integrate it with other systems.
The data that the Puppet agent reports back to the Puppet master is made up of two crucial elements: logs and metrics. The Puppet agent creates a full audit log of the events during each run, and when the reporting is enabled, this will be forwarded to the Puppet master. This allows you to see whether there were any issues during the run, and if so, what they were; or, it simply lets you examine what operations the Puppet agent performed if things went smoothly.
The metrics that the Puppet agent passes to the Puppet master are very granular and offer a fantastic insight into where Puppet is spending its time, be it fetching, processing, or applying changes. This can be very important if you are managing a large infrastructure with Puppet; a node that takes four minutes to complete isn't too bad when there are only a handful of them, but it can be downright painful when you are dealing with hundreds of them. It also allows you to start tracking the performance of your Puppet infrastructure over time. Puppet modules have a tendency to start as lean, but as they grow in complexity, they can become sluggish and bloated. Identifying speed issues early can help you refactor your modules into smaller and better performing pieces of code before they start to impact the overall stability and speed of your Puppet infrastructure.
The data derived from the logs and metrics build up a complete picture of your hosts and is enormously useful when it comes to diagnosing issues. For example, without reporting, you may have a hard time diagnosing why every single Puppet agent is suddenly throwing errors when applying the catalog; with reporting, it becomes a relatively easy matter to spot that someone has checked in a common module with a bug. Many sites use modules to manage DNS, NTP, and other common items, and a typo in one of these modules can very quickly ensure that every single host will report errors. Without reporting, you can make shrewd guesses as to the fault, but to actually prove it, you're going to have to log onto multiple nodes to examine the logs. You are going to end up spending a fair chunk of time going from node to node running the agent in the noop mode and comparing logs manually to ensure that it is indeed a common fault. This is based on the assumption that you notice the fault, of course; without the reporting in place, you may find that the nodes can be in poor shape for a substantial time before you realize that something is amiss or that you probably have not been running Puppet at all. Running Puppet on a host that has not been managed for some time may produce a list of changes that is uncomfortably long and could potentially introduce a breaking change somewhere along the line. There are many reasons why a Puppet agent may have stopped running, and you can be in for a shock if it's been a month or two since Puppet was last run on a host. A lot can change in that time, and it's entirely possible that one of the many non-applied changes might create problems in a running service.
Where the Parser is the brains of Puppet, the Facter is its eyes and ears. Before Puppet compiles a manifest, it first consults Facter to figure out a few key things. First and foremost, it needs to know where it is and what it is. These are facts that the Puppet agent can deduce by consulting Facter on elements such as the node's hostname, the number of CPUs, amount of RAM, and so on. Facter knows a surprising amount of information, out of the box, and its knowledge increases with each release. Before Facter 1.7, it was possible to use Ruby code, shipped as a Puppet plugin, to extend the facts you could gather. However, with Facter 1.7, you can also teach Facter some new tricks with external facts. External facts allow you to add to Facter's already prodigious knowledge by including anything from Ruby scripts to plain old YAML files to insert data. These additional points of data can be utilized within Puppet reports in the same way as any default Facter item, and they can also be used to add additional context around the existing data.
Now that we know the basics of how Puppet reporting works, it's time to go ahead and configure our Puppet master and agents to report. I'm going to make the assumption that you already have a working copy of either Puppet Open Source or Puppet Enterprise installed; if you haven't, there are some excellent guides available either online at http://Puppetlabs.com/learn or available for purchase elsewhere. If you're going to buy a book, I recommend Puppet 3 Beginner's Guide, John Arundel, Packt Publishing. It is an excellent and complete resource on how to install and use Puppet.
The example configurations I have used are from the latest version of Puppet Open Source (Version 3.2.2 and higher), packaged for Ubuntu. Your configuration may differ slightly if you're following this on another distribution, but it should broadly contain the same settings.
Let's take a look at the default configuration that ships with Puppet Open Source. By default, you can find the
config file in the
/etc/puppet/puppet.conf directory. The configuration file is as follows:
[main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter templatedir=$confdir/templates [master] # These are needed when the puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY
The first interesting thing to note about this configuration file is that it can be used for the Puppet agent, Puppet master, and Puppet apply commands. Many items of the configuration file tend to be common items such as log directories, run directories, and so on, so there is no real need to keep a separate version of these files for each role. Again, this is an example of the common way that Puppet has been designed, when it comes to configuration.
puppet.conf file is split up using the standard
ini notation of using configuration blocks to separate roles and the common configuration. The most common blocks that you will encounter are
[master], although sites that have implemented either Puppet faces or Puppet environments may have more. Generally speaking, as these additional configuration blocks are not used to set up reporting, we shall ignore them for the purposes of this book.
[main] configuration block is used for any configuration that is applied regardless of the mode that Puppet is run in. As you can see from the preceding configuration file, this includes locations of SSL certificates, logs, and other fundamental configuration items. These are generally things that you should keep the same on every host, regardless of it being a Puppet master or agent. However, it's worth noting that you can override the settings in a configuration block by setting them in a more specific block elsewhere in the file. Any setting in the
[main] configuration block is available to be overridden by any subsequent block further down the configuration file.
[master] block is used for all configuration items that are specific to the role of the Puppet master. As you can see in the default configuration file, this includes items for Phusion Passenger configurations, but more importantly for us, this is also where you would set items such as the report processor and its options. For our initial setup, we're going to use the master configuration to set where our reports will be stored and ensure that we are using the store report processor.
[agent] configuration block is utilized when you run Puppet as an agent. It is here that we can set the fairly simple configuration required to make the Puppet agent communicate reports with the Puppet master. We won't be spending much time in this configuration block; the majority of the configuration and processing of the Puppet reports takes place on the Puppet master rather than on the client side. There are some exceptions to this rule; for instance, you may have to amend a client-side setting to make the Puppet agent report to a different Puppet master.
Alternatively, if you are using the HTTP report process, you may wish to set a different URL. So, it's worth having an understanding of the options that are available.
Why use a separate Puppet report server?
As with all good enterprise solutions, Puppet has been designed to allow certain roles to be decomposed into separate components to ease scaling. Reporting fits into this, and you may find that if you are using report processors that are resource intensive, then you may want to separate the reporting function onto a separate server and leave as many resources as possible for the Puppet master to deal with client requests.
You can find a complete list of all configuration options for Puppet at http://docs.puppetlabs.com/references/latest/configuration.html, including the options for directing reports to a separate Puppet master.
For the most part, the Puppet server is preconfigured for reporting and is simply waiting for clients to start sending information to it. By default, the Puppet master will use the store report processor, and this will simply store the data that is sent to the Puppet master in the YAML format on the filesystem.
YAML is a data serialization format that is designed to be both machine and human readable. It's widely used and seems to have found considerable favor among open source projects. YAML has a simple layout but still has the ability to hold complex configurations that are easily accessible with relatively simple code. A nice side effect of its popularity is that it has gained first-class support in many languages and for those languages without such support, there are many libraries that allow you to easily work with them.
It's worth taking some time to become familiar with YAML; you can find the YAML specifications at http://yaml.org, and Wikipedia has an excellent entry that can ease you into understanding how this simple yet exceedingly powerful format is used.
Although the store processor is simple, it gives us an excellent starting point to ensure that our Puppet master and agent are configured correctly. The YAML files it produces hold a complete record of the Puppet agent's interactions with the client. This record includes a complete record of which resources were applied, how long it took, what value they were earlier, and much more. In later chapters, we will fully explore the wealth of data that both the Puppet reports and Puppet metrics offer us.
We're going to spend some time looking at various settings, both in this chapter and others. While you can look in the raw configuration files (and I highly encourage you to), you can also use the
puppet master –configprint command to find out what Puppet believes a particular setting to be set at. This is extremely useful in finding out how a default setting may be configured, as it may not even be present in the configuration file but will still be applied!
Out of the box, the only real Puppet master setting that may require some care and attention is the
reportdir setting. This defines where the Puppet agent reports are stored, and it is important that this points to a directory that has plenty of space. I've routinely seen installations of Puppet where the disk is consumed within a matter of days via a
reportdir setting that points at a relatively diminutive partition. By default, the
reportdir setting is set to the
/var/lib/puppet/reports directory. So at the very least, make sure that your
/var partition is fairly roomy. If your Puppet agents are set to run every thirty minutes and you have a healthy number of hosts, then whatever partition you have this directory in is going to become full very quickly. It's worth bearing in mind that there is no inbuilt rotation or compression of these log files, and you may want to consider adding one using your tool of choice. Alternatively, there is a Puppet module to manage the log rotate on the Puppet Forge at https://forge.puppetlabs.com/rodjek/logrotate.
If you do relocate the
reports directory, then ensure that the permissions are set correctly so that the user who runs the Puppet master process has access to both read/write to the reporting directory. If the permissions aren't set correctly, then it can lead to some very weird and wonderful error messages on both the Puppet master and agent.
Now that we understand some of the basics of Puppet reporting, it's time to take a look at the configuration. Let's take another look at the basic configuration that comes out of the box. The configuration file is as follows:
[main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter templatedir=$confdir/templates [master] # These are needed when the Puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY
At this point, no further changes are required on the Puppet master, and it will store client reports by default. However, as mentioned, it will store reports in the
/var/lib/Puppet/reports directory by default . This isn't ideal in some cases; sometimes, it's impossible to create a
/var directory that would be big enough (for instance, on hosts that use small primary storage such as SSD drives), or you may wish to place your logs onto a centralized storage space such as an NFS share. This is very easy to change, so let's take a look at changing our default configuration to point to a new location. This is described in the following code:
[main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter templatedir=$confdir/templates [master] reportdir = /mnt/puppetreports # These are needed when the puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY
Make sure that once you have created your Puppet's
reports directory, you change the permissions to match your Puppet user (normally,
puppet:puppet for Unix and Linux systems) and restart the Puppet master. Go ahead and run the client again, and you should see the report appear in your new reporting directory.
If you're using Puppet Enterprise, then none of this applies; the installer has taken care of this for you. If you take a look at the configuration directory (normally
/etc/Puppetlabs/master), you can see that the
Puppet.conf file has the same changes. Puppet Enterprise is configured out of the box to use the HTTP and PuppetDB storage method. This is a far more scalable way of doing things than the standard
reportdir directory and store method, and it is a good example of where Puppet Enterprise is designed with scale in mind. This doesn't mean that you can't do this in the open source version, though; in the following chapters, we will go through setting up Puppet Open Source to use these report processors and more.
Much like the Puppet master, the Puppet agent is configured with sensible default settings out of the box. In fact, in most cases, you will not need to make any changes. The only exception, generally, is if you are using a separate reporting server; in this case, you will need to specify the host that you have assigned this role to.
You can adjust the Puppet agent's reporting behavior using the report setting within the
[agent] configuration block of the Puppet configuration file. This is a simple Boolean switch that defines the behavior of the Puppet agent during a run, and by default, it is set to
true. Sometimes, you may find that you wish to explicitly set this to
true to aid anyone who is less familiar with Puppet. You can safely set this explicitly by making the following code amendment to the
[main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter templatedir=$confdir/templates [master] # These are needed when the Puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY And now let's insert the option for the client to report: [main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter templatedir=$confdir/templates [agent] report = true [master] # These are needed when the Puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY
These are the essentials to configure Puppet in order to report. There are other options available in both the Puppet agent and the Puppet master configuration that are related to reporting, but these are strictly optional; the default settings are generally okay. If you're curious, you can find a complete list of the available options on the Puppet Labs website at http://docs.puppetlabs.com/references/latest/configuration.html. Be cautious, though; some of these settings can do some very weird things to your setup and should only be used if you really need them.
Well done; you are now up and running with Puppet reporting, albeit in a very basic form. We could end the book here, but the fun is only just starting. Now that we understand how the Puppet agent interacts with the Puppet master to create reports, we can start to examine some of the other powerful features that Puppet reporting offers us.
After reading this chapter, you should now appreciate how Puppet goes about its reporting. We explored the Puppet configuration file and observed how both Puppet Enterprise and Puppet Open Source are configured for simple reporting by default. We explored the interaction between the Puppet master and the Puppet agent and looked at how Puppet and Facter work together to create detailed reports of both the activity and state. We also observed that custom facts can be added to any report. We briefly covered scalability by noting that you can use a separate Puppet master to act as a dedicated report server, and we looked at some of the reasons as to why you might want to do this.
In the next chapter, we're going to take a look at some of the dashboards that can be used with Puppet and take a whistle-stop tour of some of the major features that each of them has. You'll see how these dashboards can offer some quick and easy reporting options but also have see of the limitations of using them.