Mastering Puppet

By Thomas Uphill
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies

About this book

Puppet is a configuration management system written for system administrators to manage a large number of systems efficiently and help maintain order.

Mastering Puppetdeals with the issues faced in larger deployments such as scaling and duplicate resource definitions. It will show you how to fit Puppet into your organization and keep everyone working. The concepts presented can be adapted to suit any size organization. This book starts with setting up and installing Puppet in your organization and then moves on to implementing version control in Puppet, creating custom modules, and extending your Puppet infrastructure. Finally, you will learn tips and tricks that are useful when troubleshooting Puppet and the best practices to make you a pro.

Publication date:
July 2014
Publisher
Packt
Pages
280
ISBN
9781783982189

 

Chapter 1. Dealing with Load/Scale

A large deployment will have a large number of nodes. If you are growing your installation from scratch, you may have started with a single Puppet master running the built-in WEBrick server and moved up to a passenger installation. At a certain point in your deployment, a single Puppet master just won't cut it—the load will become too great. In my experience, this limit was around 600 nodes. Puppet agent runs begin to fail on the nodes, and catalogs fail to compile. There are two ways to deal with this problem: divide and conquer or conquer by dividing.

That is, we can either split up our Puppet master and divide the workload among several machines or we can make each of our nodes apply our code directly using Puppet agent (this is known as a masterless configuration). We'll examine each of these solutions separately.

 

Divide and conquer


When you start to think about dividing up your Puppet server, the main thing to realize is that many parts of Puppet are simply HTTP SSL transactions. If you treat those things as you would a web service, you can scale out to any size required using HTTP load balancing techniques.

The first step in splitting up the Puppet master is to configure the Puppet master to run under passenger. To ensure we all have the same infrastructure, we'll install a stock passenger configuration together and then start tweaking the configuration. We'll begin building on an x86_64 Enterprise 6 rpm-based Linux; the examples in this book were built using CentOS 6.5 and Springdale Linux 6.5 distributions. Once we have passenger running, we'll look at splitting up the workload.

Puppet with passenger

In our example installation, we will be using the name puppet.example.com for our Puppet server. Starting with a server installation of Enterprise Linux version 6, we install httpd and mod_ssl using the following code:

# yum install httpd mod_ssl
Installed:
  httpd-2.2.15-29.el6_4.x86_64
  mod_ssl-2.2.15-29.el6_4.x86_64

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Note

In each example, I will install the latest available version for Enterprise Linux 6.5 and display the version for the package requested (some packages may pull in dependencies—those versions are not shown).

To install mod_passenger, we pull in the Extra Packages for Enterprise Linux (EPEL) repository available at https://fedoraproject.org/wiki/EPEL. Install the EPEL repository by downloading the rpm file from http://download.fedoraproject.org/pub/epel/6/x86_64/repoview/epel-release.html or use the following code:

# yum install http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
Installed:
  epel-release-6-8.noarch

Once EPEL is installed, we install mod_passenger from that repository using the following code:

# yum install mod_passenger
Installed:
  mod_passenger-3.0.21-5.el6.x86_64

Next, we will pull in Puppet from the puppetlabs repository available at http://docs.puppetlabs.com/guides/puppetlabs_package_repositories.html#for-red-hat-enterprise-linux-and-derivatives using the following code:

# yum install http://yum.puppetlabs.com/el/6/products/x86_64/puppetlabs-release-6-7.noarch.rpm
Installed:
  puppetlabs-release-6-7.noarch

With the puppetlabs repository installed, we can then install Puppet using the following command:

# yum install puppet
Installed:
  puppet-3.3.2-1.el6.noarch

The Puppet rpm will create the /etc/puppet and /var/lib/puppet directories. In /etc/puppet, there will be a template puppet.conf; we begin by editing that file to set the name of our Puppet server (puppet.example.com) in the certname setting using the following code:

[main]
  logdir = /var/log/puppet
  rundir = /var/run/puppet
  vardir = /var/lib/puppet
  ssldir = $vardir/ssl
  certname = puppet.example.com
  [agent]
  server = puppet.example.com
  classfile = $vardir/classes.txt
  localconfig = $vardir/localconfig

The other lines in this file are defaults. At this point, we would expect puppet.example.com to be resolved with a DNS query correctly, but if you do not control DNS at your organization or cannot have this name resolved properly at this point, edit /etc/hosts, and put in an entry for your host pointing to puppet.example.com. In all the examples, you would substitute example.com for your own domain name.

127.0.0.1   localhost localhost.localdomain puppet 
  puppet.example.com

We now need to create certificates for our master; to ensure the Certificate Authority (CA) certificates are created, run Puppet cert list using the following command:

# puppet cert list
Notice: Signed certificate request for ca

In your enterprise, you may have to answer requests from multiple DNS names, for example, puppet.example.com, puppet, and puppet.devel.example.com. To make sure our certificate is valid for all those DNS names, we will pass the dns-alt-names option to puppet certificate generate; we also need to specify that the certificates are to be signed by the local machine using the following command:

puppet# puppet certificate generate --ca-location local --dns-alt-names puppet,puppet.prod.example.com,puppet.dev.example.com puppet.example.com
Notice: puppet.example.com has a waiting certificate request
true

Now, to sign the certificate request, first verify the certificate list using the following commands:

puppet# puppet cert list
  "puppet.example.com" (SHA256) E5:F7:26:0A:6C:41:26:FA:80:02:E5:A6:A1:DB:F4:E0:9D:9C:5B:2D:A5:BF:EC:D1:FA:84:51:F4:8C:FD:9B:AF (alt names: "DNS:puppet", "DNS:puppet.dev.example.com", "DNS:puppet.example.com", "DNS:puppet.prod.example.com")
puppet# puppet cert sign puppet.example.com
Notice: Signed certificate request for puppet.example.com
Notice: Removing file Puppet::SSL::CertificateRequest puppet.example.com at '/var/lib/puppet/ssl/ca/requests/puppet.example.com.pem'

Tip

We specified the ssldir directive in our configuration. To interactively determine where the certificates will be stored using the following command line:

$ puppet config print ssldir

One last task is to copy the certificate that you just signed into certs by navigating to /var/lib/puppet/ssl/certs. You can use Puppet certificate find to do this using the following command:

# puppet certificate find puppet.example.com --ca-location local
-----BEGIN CERTIFICATE-----
MIIF1TCCA72gAwIBAgIBAjANBgkqhkiG9w0BAQsFADAoMSYwJAYDVQQDDB1QdXBw
...
-----END CERTIFICATE-----

When you install Puppet from the puppetlabs repository, the rpm will create an Apache configuration file called apache2.conf. Locate this file and copy it into your Apache configuration directory using the following command:

# cp /usr/share/puppet/ext/rack/example-passenger-vhost.conf /etc/httpd/conf.d/puppet.conf

We will now show the Apache config file and point out the important settings using the following configuration:

PassengerHighPerformance on
PassengerMaxPoolSize 12
PassengerPoolIdleTime 1500
# PassengerMaxRequests 1000
PassengerStatThrottleRate 120
RackAutoDetect Off
RailsAutoDetect Off

The preceding lines of code configure passenger for performance. PassengerHighPerformance turns off some compatibility that isn't required. The other options are tuning parameters. For more information on these settings, see http://www.modrails.com/documentation/Users%20guide%20Apache.html.

Next we will need to modify the file to ensure it points to the newly created certificates. We will need to edit the lines for SSLCertificateFile and SSLCertificateKeyFile. The other SSL file settings should point to the correct certificate, chain, and revocation list files as shown in the following code:

Listen 8140
<VirtualHost *:8140>
  ServerName puppet.example.com
  SSLEngine on
  SSLProtocol -ALL +SSLv3 +TLSv1
  SSLCipherSuite ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP

  SSLCertificateFile /var/lib/puppet/ssl/certs/puppet.example.com.pem
  SSLCertificateKeyFile /var/lib/puppet/ssl/private_keys/puppet.example.com.pem
  SSLCertificateChainFile /var/lib/puppet/ssl/ca/ca_crt.pem
  SSLCACertificateFile /var/lib/puppet/ssl/ca/ca_crt.pem
  # If Apache complains about invalid signatures on the CRL, you can try disabling
  # CRL checking by commenting the next line, but this is not recommended.
  SSLCARevocationFile /var/lib/puppet/ssl/ca/ca_crl.pem
  SSLVerifyClient optional
  SSLVerifyDepth 1
  # The `ExportCertData` option is needed for agent certificate expiration warnings
  SSLOptions +StdEnvVars +ExportCertData
  RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e
  RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e
  RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e

  DocumentRoot /etc/puppet/rack/public/
  RackBaseURI /
<Directory /etc/puppet/rack/>
  Options None
  AllowOverride None
  Order allow,deny
  allow from all
</Directory>
</VirtualHost>

In this VirtualHost we listen on 8140 and configure the SSL certificates in the SSL lines. The RequestHeader lines are used to pass certificate information to the Puppet process spawned by passenger. The DocumentRoot and RackBaseURI settings are used to tell passenger where to find its configuration file config.ru. We create /etc/puppet/rack and it's subdirectories and then copy the example config.ru into that directory using the following commands:

# mkdir -p /etc/puppet/rack/{public,tmp}
# cp /usr/share/puppet/ext/rack/files/config.ru /etc/puppet/rack
# chown puppet:puppet /etc/puppet/rack/config.ru

We change the owner of config.ru to puppet:puppet as the passenger process will run as the owner of config.ru. Our config.ru will contain the following code:

$0 = "master"

# if you want debugging:
# ARGV << "--debug"

ARGV << "--rack"
ARGV << "--confdir" << "/etc/puppet"
ARGV << "--vardir"  << "/var/lib/puppet"

require 'puppet/util/command_line'
run Puppet::Util::CommandLine.new.execute

Tip

In this example, we have used the repository rpms supplied by Puppet and EPEL. In a production installation, you would use reposync to copy these repositories locally so that your Puppet machines do not need to access the Internet directly.

The config.ru file sets the command-line arguments for Puppet. The ARGV lines are used to set additional parameters to the puppet process. As noted in the Puppet master main page, any valid configuration parameter from puppet.conf can be specified as an argument here. Only the options that affect where Puppet will look for files should be specified here. Once puppet knows where to find puppet.conf, adding arguments here could be confusing.

With this configuration in place, we are ready to start Apache as our Puppet master. Simply start Apache with a service httpd start.

Tip

SELinux

Security Enhanced Linux (SELinux) is a system for Linux that provides support for mandatory access controls (MAC). If your servers are running with SELinux enabled, great! You will need to make some policy changes to allow Puppet to work within passenger. The easiest way to build up your policy is to use audit2allow, which is provided in policycoreutils-python. Rotate the audit logs to get a clean log file, and then start a Puppet run. After the Puppet run, get audit2allow to build a policy module for you and insert it. Then turn SELinux back on. Refer to https://bugzilla.redhat.com/show_bug.cgi?id=1051461 for more information.

# setenforce 0 
# service auditd rotate
# service httpd restart
(start a puppet run remotely)
# audit2allow -i /var/log/audit/audit.log -M puppet_passenger
# semodule -i puppet_passenger.pp
# setenforce 1

If necessary, repeat the process until everything runs cleanly. semodule will sometimes suggest enabling the allow_ypbind Boolean; this is a very bad idea. The allow_ypbind Boolean allows so many things that it is almost as bad as turning SELinux off.

Now that Puppet is running, you'll need to open the local firewall (iptables) on port 8140 to allow your nodes to connect. Then you'll need an example site.pp to get started. For testing we will create a basic site.pp that defines a default node with a single class attached to the default node as shown in the following code:

node default {
  include example
}

class example {
  notify {"This is an example": }
}

You can start a practice node or two and run their agent against the Puppet server either using --server puppet.example.com or editing the agents puppet.conf file to point at your server. Agents will by default look for an unqualified host called Puppet. Then search based on your DNS configuration (search in /etc/resolv.conf), and if you do not control DNS, you may have to edit the local /etc/hosts file to specify the IP address of your Puppet master. A sample run, for a node called node1, should look something like the following commands:

[[email protected] ~]# puppet agent -t
Info: Creating a new SSL key for node1
Info: Caching certificate for ca
Info: Creating a new SSL certificate request for node1
Info: Certificate Request fingerprint (SHA256): C4:0D:7A:54:ED:C8:E8:CC:68:D0:A6:13:C4:91:28:3D:B1:66:71:48:57:85:D8:99:AF:D0:81:54:B9:64:AB:F2
Exiting; no certificate found and waitforcert is disabled

Sign the certificate on the Puppet master and run again; the run should look like the following commands:

[[email protected] ~]# puppet cert sign node1
Notice: Signed certificate request for node1
Notice: Removing file Puppet::SSL::CertificateRequest node1 at '/var/lib/puppet/ssl/ca/requests/node1.pem'

[[email protected] ~]# puppet agent -t
Info: Caching certificate for node1
Info: Caching certificate_revocation_list for ca
Info: Retrieving plugin
Info: Caching catalog for node1
Info: Applying configuration version '1386310193'
Notice: This is an example
Notice: /Stage[main]/Example/Notify[This is an example]/message: defined 'message' as 'This is an example'
Notice: Finished catalog run in 0.03 seconds

You now have a working passenger configuration. This configuration can handle a much larger load than the default WEBrick server provided with puppet. Puppet Labs suggests the WEBrick server is appropriate for small installations; in my experience that number is much less than 100 nodes, maybe even less than 50. You can tune the passenger configuration and handle a large number of nodes, but to handle a very large installation (1000s of nodes), you'll need to start splitting up the workload.

Splitting up the workload

Puppet is a web service. But there are several different components supporting that web service, as shown in the following diagram:

Each of the different components in your Puppet infrastructure: SSL CA, Reporting, Storeconfigs, and Catalog compilation can be split up into their own server or servers.

Certificate signing

Unless you are having issues with certificate signing consuming too many resources, it's simpler to keep the signing machine a single instance, possibly with a hot spare. Having multiple certificate signing machines means that you have to keep certificate revocation lists synchronized.

Reporting

Reporting should be done on a single instance if possible. Reporting options will be shown in Chapter 7, Reporting and Orchestration.

Storeconfigs

Storeconfigs should be run on a single server, storeconfigs allows for exported resources and is optional. The recommended configuration for storeconfigs is puppetdb, which can handle several thousand nodes in a single installation.

Catalog compilation

Catalog compilation is the one task that can really bog down your Puppet installation. Splitting compilation among a pool of workers is the biggest win for scaling your deployment. The idea here is to have a primary point of contact for all your nodes—the Puppet master. Then, using proxying techniques, the master will direct requests to specific worker machines within your Puppet infrastructure. From the perspective of the nodes checking into the Puppet master, all the interaction appears to come from the main proxy machine.

To understand how we are going to achieve this load balancing, we first need to look at how the agents request data from our Puppet master. The request URL sent to our Puppet master has the format https://puppetserver:8140/environment/resource/key. The "environment" in the request URL is the Puppet environment in use by the node. It defaults to production but can be other values as we will see in later chapters. The resource being requested can be any of the accepted REST API calls, such as: catalog, certificate, resource, report, file_metadata, or file_content. A complete listing of the http_api is available at http://docs.puppetlabs.com/guides/rest_api.html.

Requests from nodes to the Puppet masters follow a pattern that we can use to configure our proxy machine. The pattern is as follows:

/environment/resource/key

For example, when node1.example.com requests its catalog in the production environment, it connects to the server and requests the following (using URL encoding):

https://puppet.example.com:8140/production/catalog/node1.example.com.

Knowing that there is a pattern to the requests, we can configure Apache to redirect requests based on regular expression matches to different machines in our Puppet infrastructure.

Our first step in splitting up our load will be to clone our Puppet master server twice to create two new worker machines, which we will call worker1.example.com and worker2.example.com. In this example, we will use 192.168.100.101 for worker1 and 192.168.100.102 for worker2. Create a private network for all the Puppet communication on 192.168.100.0/24. Our Puppet master will use the address 192.168.100.100. It is important to create a private network for the worker machines as our proxy configuration removes the SSL encryption, which means that communication between the workers and the master proxy machine is unencrypted.

Our new Puppet infrastructure is shown in the following diagram:

On our Puppet server, we will change the Apache puppet.conf as follows. Instead of listening on 8140, we will listen on 18140, and importantly, only listen on our private network as this traffic will be unencrypted. Next, we will not enable SSL on 18140. And finally we will remove any header settings we were making in our original file as shown in the following configuration:

PassengerHighPerformance on
PassengerMaxPoolSize 12
PassengerPoolIdleTime 1500
# PassengerMaxRequests 1000
PassengerStatThrottleRate 120
RackAutoDetect Off
RailsAutoDetect Off

Listen 127.0.0.1:18140
Listen 192.168.100.100:18140

<VirtualHost *:18140>
  ServerName puppet.example.com
  DocumentRoot /etc/puppet/rack/public/
  RackBaseURI /
  <Directory /etc/puppet/rack/>
    Options None
    AllowOverride None
    Order allow,deny
    allow from all
  </Directory>
</VirtualHost>

The configuration for this VirtualHost is much simpler. Now, on the worker machines, create /etc/httpd/conf.d/puppet.conf files that are identical to the previous files but have different Listen directives shown as follows:

  • On worker1:

    Listen 192.168.100.101:18140
    
  • On worker2:

    Listen 192.168.100.102:18140
    

Remember to open port 18140 on the worker machines' firewalls (iptables) and start httpd.

Returning to the Puppet master machine, create a proxy.conf file in the Apache conf.d directory (/etc/httpd/conf.d) to point at the workers. We will create two proxy pools. The first is for certificate signing, called puppetca, as shown in the following configuration:

<Proxy balancer://puppetca>
BalancerMember http://127.0.0.1:181
40
</Proxy>

A second proxy pool is for catalog compilation, called puppetworker, as shown in the following configuration:

<Proxy balancer://puppetworker>
BalancerMember http://192.168.100.102:181
40
BalancerMember http://192.168.100.101:181
40
</Proxy>

Next recreate the Puppet VirtualHost listener for 8140 with the SSL and certificate information used previously, as shown in the following configuration:

LoadModule ssl_module modules/mod_ssl.so

Listen 8140
<VirtualHost *:8140>
ServerName puppet.example.com
       SSLEngine on
       SSLProtocol -ALL +SSLv3 +TLSv1
       SSLCipherSuite ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-
       EXP
       SSLCertificateFile 
       /var/lib/puppet/ssl/certs/puppet.example.com.pem
       SSLCertificateKeyFile 
  /var/lib/puppet/ssl/private_keys/puppet.example.com.pem
       SSLCertificateChainFile /var/lib/puppet/ssl/ca/ca_crt.pem
       SSLCACertificateFile    /var/lib/puppet/ssl/ca/ca_crt.pem
       # If Apache complains about invalid signatures on the CRL, you can try disabling
       # CRL checking by commenting the next line, but this is not recommended.
       SSLCARevocationFile     /var/lib/puppet/ssl/ca/ca_crl.pem
       SSLVerifyClient optional
       SSLVerifyDepth  1
       # The `ExportCertData` option is needed for agent certificate expiration warnings
       SSLOptions +StdEnvVars +ExportCertData
       # This header needs to be set if using a loadbalancer or proxy
       RequestHeader unset X-Forwarded-For
       RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e
       RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e
       RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e

Since we know that we want all certificate requests going to the puppetca balancer, we use ProxyPassMatch to match URLs that have a certificate as the second phrase following the environment as shown in the next configuration. Our regular expression searches for a single word followed by /certificate.*, and any match is sent to our puppetca balancer.

ProxyPassMatch ^/([^/]+/certificate.*)$ balancer://puppetca/$1

The only thing that remains is to send all noncertificate requests to our load balancing pair, worker1 and worker2, as shown in the following configuration:

ProxyPass / balancer://puppetworker/
ProxyPassReverse / balancer://puppetworker
</VirtualHost>

At this point, we can restart Apache on the Puppet master.

Tip

SELinux

You'll need to allow Puppet to bind to port 18140 at this point since the default puppet SELinux module allows for 8140 only. You will also need to allow Apache to connect to the worker instances; there is a Boolean for that, httpd_can_network_connect.

Now, when a node connects, if it requests for a certificate, it will be redirected to the VirtualHost on port 18140 on the Puppet master. If the node requests a catalog, it will be redirected to one of the worker nodes. To convince yourself that this is the case, edit /etc/puppet/manifests/site.pp on your worker1 node and insert notify as shown in the following configuration:

node default {
  include example
  notify {'Compiled on worker1': }
}

Do the same on worker2 with the message Compiled on worker2, run puppet agent again on your node, and see where the catalog is being compiled using the following commands:

[[email protected] ~]# puppet agent –t
Info: Retrieving plugin
Info: Caching catalog for node1
Info: Applying configuration version '1386312527'
Notice: Compiled on worker1
Notice: /Stage[main]//Node[default]/Notify[Compiled on worker1]/message: defined 'message' as 'Compiled on worker1'
Notice: This is an example
Notice: /Stage[main]/Example/Notify[This is an example]/message: defined 'message' as 'This is an example'
Notice: Finished catalog run in 0.10 seconds

Tip

You may see "Compiled on worker2", which is expected.

To verify that certificates are being handled properly, clean the certificate for your example node, remove it from the node, and restart the agent.

  • On the master:

    master# puppet cert clean node1
    
  • On the node:

    node1# \rm -r /var/lib/puppet/ssl/*
    node1# puppet agent -t
    

Tip

Alternatively to this configuration, you could use the puppetca setting in puppet.conf on your nodes to get clients to use a specific machine for signing requests.

Since this is an enterprise installation, we should have a dashboard of some kind running to collect reports from workers.

Tip

If your reports setting on the master is either HTTP or puppetdb, then this section won't affect you.

We'll clone our worker again to make a new server called reports (192.168.100.103), which will collect our reports. We then have to add another line to our Apache proxy.conf configuration file to use the new server, and we need to place this line directly after the certificate proxy line. Since reports must all be sent to the same machine to be useful, we won't use a balancer line as before, and we will simply set the proxy to the address of the reports machine directly.

ProxyPassMatch ^/([^/]+/certificate.*)$ balancer://puppetca/$1
ProxyPassMatch ^/([^/]+/report
/.*)$ http://192.168.100.103/$1
ProxyPass / balancer://puppetworker/

Keep the /etc/httpd/conf.d/proxy.conf balancer section updated to send reports to 192.168.100.103.

Again, restart Apache and make sure that report=true is set on the node in the [agent] section of puppet.conf. Run Puppet agent on the node, and verify that the report gets sent to 192.168.100.103 (look in /var/lib/puppet/reports/).

Tip

If you are still seeing problems with client catalog compilation timeouts after creating multiple catalog workers, it may be that your client is timing out the connection before the worker has a chance to compile the catalog. Try experimenting with the configtimeout parameter in the [agent] section of puppet.conf

configtimeout=300

Setting this higher may resolve your issue. You will need to change the ProxyTimeout directive in the proxy.conf configuration for Apache as well. This will be revisited in Chapter 10, Troubleshooting.

Keeping the code consistent

At this point, we are able to scale out our catalog compilation to as many servers as we need, but we've neglected one important thing: we need to make sure that the Puppet code on all the workers remains in sync. There are a few ways we can do this, and when we cover integration with Git in Chapter 3, Git and Environments, we will see how to use Git to distribute the code.

Rsync

A simple way to distribute the code is with rsync; this isn't the best solution, but just for example, you will need to run rsync whenever you change the code. This will require changing the Puppet user's shell from /sbin/nologin to /bin/bash or /bin/rbash, which is a potential security risk.

Tip

If your puppet code is on a filesystem that supports ACLs, then creating an rsync user and giving that user rights to that filesystem is a better option. Using setfacl, it is possible to grant write access to the filesystem for a user other than Puppet.

First we create an ssh-key for rsync to use to ssh between the worker nodes and the master. We then copy the key into the authorized_keys file of the Puppet user on the workers using the ssh-copy-id command as follows:

puppet# ssh-keygen -f puppet_rsync
(creates puppet_rsync.pub puppet_rsync)

worker1# mkdir /var/lib/puppet/.ssh
# cp puppet_rsync.pub /var/lib/puppet/.ssh/authorized_keys
# chown -R puppet:puppet /var/lib/puppet/.ssh
# chmod 700 /var/lib/puppet/.ssh
# chmod 600 /var/lib/puppet/.ssh/authorized_keys
# chsh -s /bin/bash puppet

puppet# rsync -e 'ssh -i puppet_rsync' -az /etc/puppet/ [email protected]:/etc/puppet

Tip

Creating SSH Keys and using rsync

The trailing slash on the first part /etc/puppet/ and the absence of the slash on the second part, [email protected]:/etc/puppet is by design. That way, we get the contents of /etc/puppet on the master placed into /etc/puppet on the worker.

Using rsync is not a good enterprise solution, and the concept of using SSH Keys and transferring the files as the Puppet user is the important part of this method.

NFS

A second option to keep the code consistent is to use NFS. If you already have an NAS appliance, then using the NAS to share out the Puppet code may be the simplest solution. If not, using the Puppet master as an NFS server is another, but this does make your Puppet master a big, single point of failure. NFS is not the best solution to this sort of problem.

Clustered filesystem

Using a clustered filesystem such as gfs2 or glusterfs is a good way to maintain consistency between nodes. This also removes the problem of the single point of failure with NFS.

Git

A third option is to have your version control system keep the files in sync with a post-commit hook or scripts that call Git directly, such as r10k or puppet-sync. We will cover how to configure Git to do some housekeeping for us in a later chapter. Using Git to distribute the code is a popular solution since it only updates the code when a commit is made, the continuous delivery model. If your organization would rather push code at certain points, then using the scripts mentioned earlier on a routine basis is the solution I would suggest.

One more split

Now that we have our Puppet infrastructure running on two workers and the master, you might notice that the main Apache virtual machine need not be on the same machine as the certificate-signing machine. At this point, there is no need to run passenger on that main gateway machine, and you are open to use whatever load balancing solution you see fit. In this example I will be using nginx as the main proxy point.

Tip

Using nginx is not required, but you may wish to use nginx as the proxy machine. This is because nginx has more configuration options for its proxy module, such as redirecting based on client IP address.

The important thing to remember here is that we are just providing a web service. We'll intercept the SSL part of the communication with nginx and then forward it onto our worker and CA machines as necessary. Our configuration will now look like the following diagram:

We will start with a blank machine this time; we do not need to install passenger or Puppet on the machine. To make use of the latest SSL-handling routines, we will download nginx from the nginx repository.

# yum install http://nginx.org/packages/rhel/6/noarch/RPMS/nginx-release-rhel-6-0.el6.ngx.noarch.rpm
Installed:
  nginx-release-rhel.noarch 0:6-0.el6.ngx
# yum install nginx
Installed:
  nginx-1.4.4-1.el6.ngx.x86_64

Now we need to copy the SSL CA files from the Puppet master to this gateway using the following commands:

puppet# scp /var/lib/puppet/ssl/ca/ca_crl.pem gateway:/etc/nginx
puppet# scp /var/lib/puppet/ssl/ca/ca_crt.pem gateway:/etc/nginx
puppet# scp /var/lib/puppet/ssl/certs/puppet.example.com.pem gateway:/etc/nginx
puppet# scp /var/lib/puppet/ssl/private_keys/puppet.example.com.pem gateway:/etc/nginx/puppet.example.com.key

Now we need to create a gateway configuration for nginx, which we will place in /etc/ngninx/conf.d/puppet-proxy.conf

We will define the two proxy pools as we did before, but using nginx syntax this time.

upstream puppetca {
  server 192.168.100.100:18140;
}

upstream puppetworkers {
  server 192.168.100.101:8140;
  server 192.168.100.102:8140;
}

Next, we create a server stanza, specifying that we handle the SSL connection, and we need to set some headers before passing on the communication to our proxied servers.

server {
  listen 8140 ssl;
  server_name puppet.example.com;

  default_type application/x-raw;

  ssl on;
  ssl_certificate    puppet.example.com.pem;
  ssl_certificate_key  puppet.example.com.key;
  ssl_trusted_certificate  ca_crt.pem;
  ssl_crl      ca_crl.pem;

  ssl_session_cache  shared:SSL:5m;
  ssl_session_timeout  5m;

  ssl_protocols    SSLv2 SSLv3 TLSv1;
  ssl_ciphers  
  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
  ssl_prefer_server_ciphers on;
  ssl_verify_client optional_no_ca;

Setting ssl_verify_client to optional_no_ca is important, since on the first connection, the client will not have a signed certificate, so we need to accept all connections but mark a header with the verification status.

  proxy_set_header  Host      $host;
  proxy_set_header  X-Real-IP  $remote_addr;
  proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header  X-Client-Verify  $ssl_client_verify;
  proxy_set_header  X-Client-DN    $ssl_client_s_dn;
  proxy_set_header  X-SSL-Subject    $ssl_client_s_dn;
  proxy_set_header   X-SSL-Issuer    $ssl_client_i_dn;
  proxy_read_timeout  1000;

The header X-Client-Verify will hold success or failure at this point, so our Puppet master will know if the certificate is valid. Now we need to look for certificate requests and hand those off to the puppetca pool:

location ~* ^/.*/certificate {
  proxy_pass http://puppetca;
  proxy_redirect off;
  proxy_read_timeout 1000;
}

Then we can send all other requests to our worker pool

location / {
  proxy_pass http://puppetworkers;
  proxy_redirect off;
  proxy_read_timeout 1000;
}

Now we need to start nginx on the gateway machine, open up port 8140 on the firewall, and open up 18140 on the Puppet master firewall (gateway will now need to communicate with that port).

Running puppet again on your node will now produce the same results as before, but you are now able to leverage the load balancing of nginx over that of Apache.

Tip

You will need to synchronize the SSL CA Certificate Revocation List (CRL) from the Puppet master to the gateway machine. Without synchronization, the keys that are removed from the Puppet master will not be revoked on the gateway machine.

One last split or maybe a few more

We have already split our workload into a certificate-signing machine (the master or puppetca), a pool of catalog machines, and a report-gathering machine. What is interesting as an exercise at this point is that we can also serve files up using our gateway machine.

Based on what we know about the puppet HTTP API, we know that requests for file_buckets, and files have specific URIs that we can serve directly from nginx without using passenger or Apache or even puppet. To test the configuration, alter the definition of the example class to include a file as follows:

class example {
  notify { 'This is an example': }
  file {'/tmp/example':
    mode => 644, 
    owner => 100,
    group => 100,
    source => 'puppet:///modules/example/example',
  }
}

Create the example file in /etc/puppet/modules/example/files/example.

This file lives on the workers. On the gateway machine, rsync your Puppet module code from the workers into /var/lib/nginx/puppet. Now, to prove that the file is coming from the gateway, edit the example file after you run the rsync.

The /etc/puppet/modules/example/files/example file lives on the gateway. At this point, we can start serving up files from nginx by putting in a location clause as follows; we will do two stanzas, one for files outside modules and the other for module-provided files at /etc/nginx/conf.d/gateway.conf.

location ~* ^/.*/file_content/modules {
  rewrite ^/([^/]+)/file_content/modules/([^/]+)/(.*) /$2/files/$3;
  break;
  root /var/lib/nginx/puppet/modules/;
}
location ~* ^/.*/file_content/ {
  rewrite ^/([^/]+)/file_content/([^/]+)/(.*) /$2/files/$3;
  break;
  root /var/lib/nginx/puppet/;
}

Restart nginx on the gateway machine, and then run Puppet on the node using the following command:

[[email protected] ~]# puppet agent –t

Notice: /Stage[main]/Example/File[/tmp/example]/ensure: defined content as '{md5}c83849f23a139c41edfbcd8473a81ac1'

Notice: Finished catalog run in 0.16 seconds
[[email protected] ~]# cat /tmp/example
This file lives on the gateway

As we can see, although the file living on the workers has the contents "This file lives on the workers," our node is getting the file directly from nginx on the gateway.

Tip

Our node will keep changing /tmp/example to the same file each time because the catalog is compiled on the worker machine with contents different from those of the gateway. In a production environment, all the files would need to be synchronized.

One important thing to consider is security, as any configured client can retrieve files from our gateway machine. In production, you would want to add ACLs to the file location.

As we have seen, once the basic proxying is configured, further splitting up of the workload becomes a routine task. We can split the workload to scale to handle as many nodes as we require.

 

Conquer by dividing


Depending on the size of your deployment and the way you connect to all your nodes, a masterless solution may be a good fit. In a masterless configuration, you don't run the Puppet agent; rather, you push the Puppet code to a node, and then run Puppet apply. There are a few benefits to this method and a few drawbacks.

Benefits

Drawbacks

No single point of failure

Can't use built-in reporting tools such as dashboard.

Simpler configuration

Exported resources requires nodes have write access to the database.

Finer-grained control on where code is deployed

Each node has access to all the code

Multiple simultaneous runs do not affect each other (reduces contention)

More difficult to know when a node is failing to apply catalog correctly

Connection to Puppet master not required (offline possible)

No certificate management

No certificate management

The idea with a masterless configuration is that you distribute the Puppet code to each node individually and then kick off a puppet run to apply that code. One of the benefits of Puppet is that it keeps your system in a known good state, so when choosing masterless it is important to build your solution with this in mind. A cron job configured by your deployment mechanism that can apply Puppet to the node on a routine schedule will suffice.

The key parts of a masterless configuration are: distributing the code, pushing updates to the code, and ensuring the code is applied routinely to the nodes. Pushing a bunch of files to a machine is best done with some sort of package management.

Tip

Many masterless configurations use Git to have clients pull the files, this has the advantage of clients pulling changes.

For Linux systems, the big players are rpm and dpkg, whereas for MacOS, Installer package files can be used. It is also possible to configure the nodes to download the code themselves from a web location. Some large installations use Git to update the code as well.

The solution I will outline is that of using an rpm deployed through yum to install and run Puppet on a node. Once deployed, we can have the nodes pull updated code from a central repository rather than rebuild the rpm for every change.

Creating an rpm

To start our rpm, we will make an rpm spec file, we can make this anywhere since we don't have a master in this example. Start by installing rpm-build, which will allow us to build the rpm.

# yum install rpm-build
Installing
  rpm-build-4.8.0-37.el6.x86_64

It will be important later to have a user to manage the repository, so create a user called builder at this point. We'll do this on the Puppet master machine we built earlier. Create an rpmbuild directory with the appropriate subdirectories, and then create our example code in this location.

# sudo -iu builder
$ mkdir -p rpmbuild/{SPECS,SOURCES}
$ cd SOURCES
$ mkdir -p modules/example/manifests
$ cat <<EOF>modules/example/manifests/init.pp
class example {
notify {"This is an example.": }
file {'/tmp/example':
mode => '0644',
owner => '0',
group => '0',
content => 'This is also an example.'
}
}
EOF
$ tar cjf example.com-puppet-1.0.tar.bz2 modules

Next, create a spec file for our rpm in rpmbuild/SPECS as shown in the following commands:

Name:           example.com-puppet
Version: 1.0
Release: 1%{?dist}
Summary: Puppet Apply for example.com

Group: System/Utilities
License: GNU
Source0: example.com-puppet-%{version}.tar.bz2
BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)

Requires: puppet
BuildArch:      noarch

%description
This package installs example.com's puppet configuration
and applies that configuration on the machine.


%prep

%setup -q -c
%install
mkdir -p $RPM_BUILD_ROOT/%{_localstatedir}/local/puppet
cp -a . $RPM_BUILD_ROOT/%{_localstatedir}/local/puppet

%clean
rm -rf %{buildroot}

%files
%defattr(-,root,root,-)
%{_localstatedir}/local/puppet

%post
# run puppet apply
/bin/env puppet apply --logdest syslog --modulepath=%{_localstatedir}/local/puppet/modules %{_localstatedir}/local/puppet/manifests/site.pp 

%changelog
* Fri Dec 6 2013 Thomas Uphill <[email protected]> - 1.0-1
- initial build

Then use rpmbuild to build the rpm based on this spec, as shown in the following command:

$ rpmbuild -ba example.com-puppet.spec

Wrote: /home/builder/rpmbuild/SRPMS/example.com-puppet-1.0-1.el6.src.rpm
Wrote: /home/builder/rpmbuild/RPMS/noarch/example.com-puppet-1.0-1.el6.noarch.rpm

Now, deploy a node and copy the rpm onto that node. Verify that the node installs Puppet and then does a Puppet apply run.

# yum install example.com-puppet-1.0-1.el6.noarch.rpm 
Loaded plugins: downloadonly

Installed:
  example.com-puppet.noarch 0:1.0-1.el6 
Dependency Installed:
  augeas-libs.x86_64 0:1.0.0-5.el6
...
  puppet-3.3.2-1.el6.noarch

Complete!

Verify that the file we specified in our package has been created by using the following command:

# cat /tmp/example
This is also an example.

Now, if we are going to rely on this system of pushing Puppet to nodes, we have to make sure we can update the rpm on the clients and we have to ensure that the nodes still run Puppet regularly so as to avoid configuration drift (the whole point of Puppet). There are many ways to accomplish these two tasks. We can put the cron definition into the post section of our rpm:

%post
# install cron job
/bin/env puppet resource cron 'example.com-puppet' command='/bin/env puppet apply --logdest syslog --modulepath=%{_localstatedir}/local/puppet/modules %{_localstatedir}/local/puppet/manifests/site.pp' minute='*/30' ensure='present'

We could have a cron job be part of our site.pp, as shown in the following command:

cron { 'example.com-puppet':
  ensure      => 'present',
  command => '/bin/env puppet apply --logdest syslog --modulepath=/var/local/puppet/modules /var/local/puppet/manifests/site.pp',
  minute  => ['*/30'],
  target   => 'root',
  user  => 'root',
}

To ensure the nodes have the latest version of the code, we can define our package in the site.pp.

package {'example.com-puppet':  ensure => 'latest' }

In order for that to work as expected, we need to have a yum repository for the package and have the nodes looking at that repository for packages.

Creating the YUM repository

Creating a YUM repository is a very straightforward task. Install the createrepo rpm and then run createrepo on each directory you wish to make into a repository.

# mkdir /var/www/html/puppet
# yum install createrepo

Installed:
 createrepo.noarch 0:0.9.9-18.el6   
# chown builder /var/www/html/puppet
# sudo -iu builder
$ mkdir /var/www/html/puppet/{noarch,SRPMS}
$ cp /home/builder/rpmbuild/RPMS/noarch/example.com-puppet-1.0-1.el6.noarch.rpm /var/www/html/puppet/noarch
$ cp rpmbuild/SRPMS/example.com-puppet-1.0-1.el6.src.rpm /var/www/html/puppet/SRPMS
$ cd /var/www/html/puppet
$ createrepo noarch
$ createrepo SRPMS

Our repository is ready, but we need to export it with the web server to make it available to our nodes. This rpm contains all our Puppet code, so we need to ensure that only the clients we wish get access to the files. We'll create a simple listener on port 80 for our Puppet repository

Listen 80
<VirtualHost *:80>
  DocumentRoot /var/www/html/puppet
</VirtualHost>

Now, the nodes need to have the repository defined on them so they can download the updates when they are made available via the repository. The idea here is that we push the rpm to the nodes and have them install the rpm. Once the rpm is installed, the yum repository pointing to updates is defined and the nodes continue updating themselves.

yumrepo { 'example.com-puppet':
  baseurl  => 'http://puppet.example.com/noarch',
  descr    => 'example.com Puppet Code Repository',
  enabled  => '1',
  gpgcheck => '0',
}

So to ensure that our nodes operate properly, we have to make sure of the following things:

  • Install code

  • Define repository

  • Define cron job to run Puppet apply routinely

  • Define package with latest tag to ensure it is updated

A default node in our masterless configuration requires that the cron task and the repository be defined. If you wish to segregate your nodes into different production zones (such as development, production, and sandbox), I would use a repository management system like Pulp. Pulp allows you to define repositories based on other repositories and keeps all your repositories consistent.

Tip

You should also setup a gpg key on the builder account that can sign the packages it creates. You would then distribute the gpg public key to all your nodes and enable gpgcheck on the repository definition.

 

Summary


Dealing with scale is a very important task in enterprise deployments. As your number of nodes increases beyond the proof-of-concept stage (> 50 nodes), the simple WEBrick server cannot be used. In the first section, we configured a Puppet master with passenger to handle a larger load. We then expanded that configuration with load balancing and proxying techniques realizing that Puppet is simply a web service. Understanding how nodes request files, catalogs, and certificates allows you to modify the configuration and bypass or alleviate bottlenecks.


In the last section, we explored masterless configuration, wherein instead of checking into Puppet to retrieve new code, the nodes check out the code first and then run against it on a schedule.

Now that we have dealt with the load issue, we need to turn our attention to managing the modules to be applied to nodes. We will cover organizing the nodes in the next chapter.

About the Author

  • Thomas Uphill

    Thomas Uphill has been working with Unix and Linux since the 90s. He primarily works on Linux and has an RHCA from RedHat. He's written several books on Puppet and routinely presents on Linux and Puppet at conferences such as Puppet Conf and LISA. He enjoys writing code in Ruby and Python. When not working, he blogs at ramblings.narrabilis.com and @uphillian on Twitter and IRC.

    Browse publications by this author
Book Title
Access this book, plus 7,500 other titles for FREE
Access now