Extending Puppet

By Alessandro Franceschi
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Puppet Essentials

About this book

Puppet has changed the way we manage our systems, but Puppet itself is changing and evolving, as are the ways in which we use it.

A clear, updated, practical, and focused view of the current state of the technology and the evolution of Puppet is what we need to tackle our IT infrastructure challenges and avoid common errors when designing our architectures.

This is a detailed, practical book that covers the different components of the Puppet ecosystem and explores how to use them to deploy and manage different kinds of IT infrastructures.

Updated with the most recent trends and best practices, this book gives you a clear view on how to "connect the dots" and expands your understanding to successfully use and extend Puppet.

Publication date:
June 2014
Publisher
Packt
Pages
328
ISBN
9781783981441

 

Chapter 1. Puppet Essentials

There are moments in our professional life when we encounter technologies that trigger an inner wow effect. We realize there's something special in them, and we start to wonder how they can be useful for our current needs and, eventually, wider projects.

Puppet, for me, has been one of these turning point technologies. I have reasons to think that we might share a similar feeling.

If you are new to Puppet, you are probably starting from the wrong place, since there are better fitting titles around to grasp its basic concepts.

This book won't indulge too much in the fundamentals, but don't despair as this chapter might help for a quick start.

It provides the basic Puppet background needed to understand the rest of the contents and may also offer valuable information to more experienced users.

In this chapter, we are going to review the following topics:

  • The Puppet ecosystem: The components, its history, and the basic concepts behind configuration management

  • How to install and configure Puppet: Commands and paths to understand where things are placed

  • The core components and concepts: Terms such as manifests, resources, nodes, and classes will become familiar

  • The main language elements: Variables, references, resource defaults, ordering, conditionals, comparison operators, and virtual and exported resources

  • How Puppet stores the changes it makes and how to revert them

The contents of this chapter are quite dense; take your time to review and assimilate them; if they sound new or look too complex, it is because the path towards Puppet awareness is never too easy.

 

The Puppet ecosystem


Puppet is a configuration management and automation tool. We use it to install, configure, and manage the components of our servers.

Written in Ruby and released with an open source license (Apache 2), it can run on any Linux distribution, many other UNIX variants (Solaris, *BSD, AIX, and Mac OS X), and Windows.

Its development started in 2005 by Luke Kanies as an alternate approach to the existing configuration management tools (most notably, CFEngine and BladeLogic).

The project has grown year after year. Kanies' own company, Reductive Labs, which was renamed in 2010 to Puppet Labs, has received a total funding of $45.5 million in various funding rounds (among the investors, there are names such as VMware, Google, and Cisco).

Now, it is one of the top 100 fastest growing companies in the US. It employs more than 250 people, and has a solid business based on open source software, consulting services, training, and certifications. It also has Puppet Enterprise, which is the commercial version that is based on the same open source Puppet code base, but it provides a web GUI that improves and helps in easier Puppet usage and administration.

The Puppet ecosystem features a vibrant, large, and active community that discusses it at the Puppet Users and Puppet Developers Google group, on the crowded Freenode's #puppet IRC channel, at the various Puppet Camps that are held multiple times a year all over the world, and at the annual PuppetConf, which is improving and getting bigger year after year.

Various software products are complementary to Puppet; some of them are developed by Puppet Labs, which are as follows:

  • Hiera is a key-value lookup tool that is the current choice of reference for storing data related to your Puppet infrastructure

  • MCollective is an orchestration framework that allows parallel execution of tasks on multiple servers. It is a separate project by Puppet Labs, which works well with Puppet

  • Facter is a required complementary tool as it is executed on each managed node and gathers local information in key/value pairs (facts) that are used by Puppet

  • Geppetto is an IDE that is based on Eclipse that allows easier and assisted development of Puppet code

  • Puppet Dashboard is an open source web console for Puppet

  • PuppetDB is a powerful backend that can store all the data gathered and generated by Puppet

  • Puppet Enterprise is the commercial solution to manage Puppet, Mcollective, and PuppetDB via a web frontend

The community has produced other tools and resources; the most noticeable ones are the following:

  • The Foreman is a systems lifecycle management tool that integrates perfectly with Puppet

  • Puppetboard is a web frontend for PuppetDB

  • Kermit is a web frontend for Puppet and Mcollective

  • A lot of community code is released as modules, which are reusable components that allow the management of any kind of application and software via Puppet

Why configuration management matters

IT operations have changed drastically in the last few years; virtualization, cloud, business needs, and emerging technologies have accelerated the pace of how systems are provisioned, configured, and managed.

The manual setup of a growing number of operating systems is no longer a sustainable option. At the same time, in-house custom solutions to automate the installation and the management of systems cannot scale in terms of required maintenance and development efforts.

For these reasons, configuration management tools such as Puppet, Chef, CFEngine, Rudder, Salt, and Ansible (to mention only the most known open source ones) are becoming increasingly popular in many infrastructures.

They allow a centralized and controlled approach to systems' management, based on code and data structures, which can be managed via a Software Change Management (SCM) tool (git is the choice of reference in Puppet world).

Once we can express the status of our infrastructure with versioned code, we gain powerful benefits:

  • We can reproduce our setups in a consistent way; what is executed once can be executed any time; the procedure to configure a server from scratch can be repeated without the risk of missing parts.

  • The log of our code commits reflects the history of changes on our infrastructure: who did what, when, and if commits comments are pertinent, then why.

  • We can scale quickly; the configurations we did for a server can be applied to all the servers of the same kind.

  • We have aligned and coherent environments. Our development, test, QA, staging, and production servers can share the same setup procedures and configurations.

With these kinds of tools, we can have a system provisioned from zero to production in a few minutes, or we can quickly propagate a configuration change over our whole infrastructure automatically.

Their power is huge and has to be handled with care as we can automate massive and parallelized setups and configurations of systems; we might automate distributed destructions. With great power comes great responsibility.

 

Puppet components


Before diving into the installation and configuration details, we need to clarify and explain some Puppet terminology to get the whole picture.

Puppet features a declarative Domain Specific Language (DSL), which expresses the desired state and properties of the managed resources.

Resources can be any component of a system, for example, packages to install, services to start, files to manage, users to create, and also custom and specific resources such as MySQL grants, Apache virtual hosts, and so on.

Puppet code is written in manifests, which are simple text files with a .pp extension. Resources can be grouped in classes (do not consider them as classes as in OOP; they aren't). Classes and all the files needed to define the required configurations are generally placed in modules, which are directories structured in a standard way that are supposed to manage specific applications or a system's features (there are modules to manage Apache, MySQL, sudo, sysctl, networking, and so on).

When Puppet is executed, it first runs facter, a companion application, which gathers a series of variables about the system (the IP address, the hostname, the operating system, the MAC address, and so on), which are called facts, and are sent to the Master.

Facts and user-defined variables can be used in manifests to manage how and what resources to provide to the clients.

When the Master receives a connection, then it looks in its manifests (starting from /etc/puppet/manifests/site.pp) what resources have to be applied for that client host, also called a node.

The Master parses all the DSL code and produces a catalog that is sent back to the client (in the PSON format, which is a JSON variant used in Puppet). The production of the catalog is often referred to as catalog compilation, even if the term is not perfectly appropriate (there is no other program that is compiled from the source code to binary), and is going to be discontinued. In this book, we will still use it as it is quite common and widely used.

Once the client receives the catalog, it starts to apply all the resources declared there, irrespective of whether packages are installed (or removed), services have started, configuration files are created or changed, and so on. The same catalog can be applied multiple times; if there are changes on a managed resource (for example, a manual modification of a configuration file), they are reverted to the state defined by Puppet; if the system's resources are already at the desired state, nothing happens.

This property is called idempotence and is at the root of the Puppet declarative model. Since it defines the desired state of a system, it must operate in a way that ensures that this state is obtained wherever the starting conditions and the number of times Puppet is applied.

Puppet can report the changes it makes on the system and audit the drift between the system's state and the desired state as defined in its catalog.

 

Installation and configuration


Puppet uses a client-server paradigm. Clients (also called agents) are installed on all the systems to be managed; the server(s) (also called the Master) is installed on a central machine(s) from where we control the whole infrastructure.

We can find Puppet's packages on the most recent OS, either in the default repositories on in the additional ones (for example, EPEL for Red Hat derivatives).

The client package is generally called puppet, so the installation is a matter of typing something like the following:

apt-get install puppet # On Debian derivatives
yum install puppet # On Red Hat derivatives

To install the server components, we can run the following command:

apt-get install puppetmaster # On Debian derivatives
yum install puppet-server # On RedHat derivatives

Note

To have updated packages for the latest versions, we should use Puppet Labs' repositories: http://docs.puppetlabs.com/guides/puppetlabs_package_repositories.html.

To install Puppet on other operating systems, check http://docs.puppetlabs.com/guides/installation.html.

Both agents (clients) and the Master (server) use the configuration file /etc/puppet/puppet.conf, which is divided in [sections] and has an INI-like format. All the parameters present in the configuration file may be overridden while invoking puppet from the command line. All of them have default values; here is a sample with some of the most important ones:

[main]
    logdir = /var/log/puppet
    vardir = /var/lib/puppet
	rundir = /var/run/puppet
    ssldir = $vardir/ssl
[agent]
    server = puppetcertname = $fqdn # Here, by default, is the node's fqdn
	runinterval = 30[master]
    autosign = false
	manifest = /etc/puppet/manifests/site.pp
	modulepath = /etc/puppet/modules:/usr/share/puppet/modules

A very useful command to see all the current configuration settings is as follows:

puppet config print all

With the previous information, we have all that we need to understand the main files and directories that we deal with when we work with Puppet:

  1. Logs are in /var/log/puppet (but also on normal syslog files, with the facility daemon), both for agents and Master.

  2. The Puppet operational data is in /var/lib/puppet.

  3. SSL certificates are stored in /var/lib/puppet/ssl. By default, the agent tries to contact a Master hostname called puppet, so either name our server puppet.$domain or provide the correct name in the server parameter.

  4. When the agent communicates with the Master, it presents itself with its certname (this is also the hostname placed in its SSL certificates). By default, the certname is the fully qualified domain name (FQDN) of the agent's system.

  5. By default, the agent runs as a daemon that connects to the Master every 30 minutes and fetches its configuration (the catalog, to be precise).

  6. On the Master, we have to sign each client's certificates request (manually). If we can cope with the relevant security concerns, we may automatically sign them (autosign = true).

  7. The first manifest file (that contains Puppet DSL code) that the Master parses when a client connects, in order to produce the configuration to apply to it, is /etc/puppet/manifests/site.pp. This is important as all our code starts from here.

  8. Puppet modules are searched for and automatically loaded from the directories /etc/puppet/modules and /usr/share/puppet/modules on the Master.

Note

Puppet Enterprise is provided with custom packages that reproduce the full stack, Ruby included, and uses different directories.

/etc/puppetlabs/puppet/ is the main configuration directory; here, we find puppet.conf and other configuration files. The other directories are configured by default with these paths:

    vardir = /var/opt/lib/pe-puppet
    logdir = /var/log/pe-puppet
    rundir = /var/run/pe-puppet
    modulepath = /etc/puppetlabs/puppet/modules:/opt/puppet/share/puppet/modules

In this book, we will mostly refer to the open source version; besides, the previous paths, all the principles, and usage patterns are the same.

 

Puppet in action


Client-server communication is done using REST-like API calls on an SSL socket; basically, it's all HTTPS traffic from clients to the server's port 8140/TCP.

The first time we execute Puppet on a node, its x509 certificates are created and placed in ssldir, and then the Puppet Master is contacted in order to retrieve the node's catalog.

On the Puppet Master, unless we have autosign enabled, we must manually sign the client's certificates using the cert subcommand:

puppet cert list # List the unsigned clients certificates
puppet cert list --all # List all certificates
puppet cert sign <certname> # Sign the given certificate

Once the node's certificate has been recognized as valid and been signed, a trust relationship is created, and a secure client-server communication can be established.

If we happen to recreate a new machine with an existing certname, we have to remove the certificate of the old client from the server with the following command:

puppet cert clean  <certname> # Remove a signed certificate

At times, we may also need to remove the certificates on the client; we can do this with the following command:

mv /var/lib/puppet/ssl /var/lib/puppet/ssl.bak

This is safe enough as the whole directory is recreated with new certificates when Puppet is run again (never do this on the Master as it'll remove all the clients' certificates previously signed, along with the Master's certificate, whose public key has been copied to all clients).

A typical Puppet run is composed of different phases. It's important to know them in order to troubleshoot problems:

  1. Execute Puppet on the client. On a root shell, run puppet agent -t.

  2. If pluginsync = true (default from Puppet 3.0), the client retrieves any extra plugin (facts, types, and providers) present in the modules on the Master's $modulepath client output with the following command:

    Info: Retrieving plugin
    
  3. The client runs facter and sends its facts to the server client output:

    Info: Loading facts in /var/lib/puppet/lib/facter/... [...]
    
  4. The Master looks for the client's certname in its nodes' list.

  5. The Master compiles the catalog for the client using also its facts. Master's logs:

    Compiled catalog for <client> in environment production in 8.22 seconds
    
  6. If there are syntax errors in the processed Puppet code, they are exposed here, and the process terminates; otherwise, the server sends the catalog to the client in the PSON format.

    Client output:

    Info: Caching catalog for <client>
    
  7. The client receives the catalog and starts to apply it locally. If there are dependency loops, the catalog can't be applied and the whole run fails.

    Client output:

    Info: Applying configuration version '1355353107'
    
  8. All changes to the system are shown on stdout or in logs. If there are errors (in red or pink, according to Puppet versions), they are relevant to specific resources but do not block the application of other resources (unless they depend on the failed ones).

  9. At the end of the Puppet run, the client sends to the server a report of what has been changed.

    Client output:

    Finished catalog run in 13.78 seconds
    
  10. The server sends the report to a report collector if enabled.

Resources

When dealing with Puppet's DSL, most of the time we use resources as they are single units of configuration that express the properties of objects on the system. A resource declaration is always composed by the following parts:

  • type: a package, service, file, user, mount, exec, and so on

  • title: how it is called and referred in other parts of the code

  • One or more attributes

    type { 'title':
      argument  => value,
      other_arg => value,
    }

Inside a catalog, for a given type, there can be only one title; otherwise, we get an error as follows:

Error: Duplicate declaration: <Type>[<name>] is already declared in file <manifest_file> at line <line_number>; cannot redeclare on node <node_name>.

Resources can be native (written in Ruby), or defined by users in Puppet DSL.

These are examples of common native resources; what they do should be quite obvious:

  file { 'motd':
    path    => '/etc/motd',
    content => "Tomorrow is another day\n",
  }

  package { 'openssh':
    ensure => present,
  }

  service { 'httpd':
    ensure => running, # Service must be running
    enable => true,    # Service must be enabled at boot time
  }

We can write code of this kind in manifests, which are files with a .pp extension that contain valid Puppet code. It's possible to test the effect of this code on the local system with the puppet apply command, which expects the path of a manifest file as the argument:

puppet apply /etc/puppet/manifests/site.pp

We can also directly execute Puppet code with the --execute (-e) option:

puppet apply –e "package { 'openssh': ensure => present }"

In this case, instead of a manifest file, the argument is a fragment of valid Puppet DSL.

For inline documentation about a resource, use the describe subcommand, for example:

puppet describe file

Note

For a complete reference of the native resource types and their arguments, check http://docs.puppetlabs.com/references/latest/type.html.

The resource abstraction layer

From the previous resource examples, we can deduce that the Puppet DSL allows us to concentrate on the types of objects (resources) to manage, and it doesn't bother us on how these resources may be applied on different operating systems.

This is one of Puppet's strong points; resources are abstracted from the underlying OS; we don't have to care or specify how, for example, to install a package on Red Hat Linux, Debian, Solaris, or Mac OS; we just have to provide a valid package name.

This is possible thanks to Puppet's Resource Abstraction Layer (RAL), which is engineered around the concept of types and providers.

Types, as we have seen, map to an object on the system.

There are more than 50 native types in Puppet (some of them are applicable only to a specific OS); the most commonly used ones are augeas, cron, exec, file, group, host, mount, package, service, and user.

To have a look at their Ruby code and learn how to create custom types, check this file:

ls -l $(facter rubysitedir)/puppet/type

For each type, there is at least one provider, which is the component that enables that type on a specific OS. For example, the package type is known for having a large number of providers that manage the packages' installations on many OSes, which are aix, appdmg, apple, aptitude, apt, aptrpm, blastwave, dpkg, fink, freebsd, gem, hpux, macports, msi, nim, openbsd, pacman, pip, pkgdmg, pkg, pkgutil, portage, ports, rpm, rug, sunfreeware, sun, up2date, urpmi, yum, and zypper.

We can find them with the following command:

ls -l $(facter rubysitedir)/puppet/provider/package/

The Puppet executable offers a powerful subcommand to interrogate and operate with the RAL puppet resource.

For a list of all the users present on the system, type the following:

puppet resource user

For a specific user, type the following:

puppet resource user root

Other examples that might show glimpses of the power of RAL to map a system's resources are as follows:

puppet resource package
puppet resource mount
puppet resource host
puppet resource file /etc/hosts
puppet resource service

The output is in the Puppet DSL format; we can use it in our manifests to reproduce that resource wherever we want.

The puppet resource subcommand can also be used to modify the properties of a resource directly from the command line, and since it uses the Puppet RAL, we don't have to know how to do that on a specific OS, for example, to enable the httpd service:

puppet resource service httpd ensure=running enable=true

Nodes

We can place the above resources in our first manifest file (/etc/puppet/manifests/site.pp) or in the one included from there, and they would be applied to all our Puppet-managed nodes. This is okay for quick samples out of books, but in real life, things are much different. We have hundreds of different resources to manage and apply, with different logic and properties to (dozens? hundreds? thousands?) different systems.

To help you organize your Puppet code, there are two different language elements; with node, we can confine resources to a given host and apply them only to it; with class, we can group different resources (or other classes) that generally have a common function or task.

Whatever is declared in a node definition is included only in the catalog compiled for that node. The general syntax is as follows:

node $name [inherits $parent_node] {
  [ Puppet code, resources and classes applied to the node ]
}

Here $name is a placeholder for the certname of the client (by default, it's FQDN) or a regular expression; it's possible to inherit in a node whatever is defined in the parent node and inside the curly braces; we can place any kind of Puppet code, such as resource declarations, class inclusions, and variable definitions. Here are some examples:

node 'mysql.example.com' {
  package { 'mysql-server':
    ensure => present,
  }
  service { 'mysql':
    ensure => 'running',
  }
}

However, generally in nodes we just include classes, so a better real-life example would be the following one:

node 'mysql.example.com' {
  include common
  include mysql
}

The previous include statements do what we might expect; they include all the resources declared in the referred class.

Note that there are alternatives to the usage of the node statement; we can use an External Node Classifier (ENC) to define which variables and classes are assigned to nodes, or we can have a nodeless setup, where resources applied to nodes are defined in a case statement based on the hostname or a similar fact that identifies a node.

Classes and defines

A class can be defined (the resources provided by the class are defined for later usage, but are not yet included in the catalog) with this syntax:

class mysql {
  $mysql_service_name = $::osfamily ? {
    'RedHat' => 'mysqld',
    default  => 'mysql',
  }
  package { 'mysql-server':
    ensure => present,
  }
  service { 'mysql':
    name => $mysql_service_name,
    ensure => 'running',
  }
  […]
}

Once defined, a class can be declared (the resources provided by the class are actually included in the catalog) in two ways:

  • Just by including it (we can include the same class many times, but it is evaluated only once):

    include mysql
  • Using the parameterized style (available since Puppet 2.6), where we can optionally pass parameters to the class if available (we can declare a class with this syntax only once for each node in our catalog):

    class { 'mysql': }

A parameterized class has a syntax similar to the following code:

class mysql (
  $root_password,
  $config_file_template = undef,
  ...
) {
  […]
}

In this code, the expected parameters are defined between parentheses, which may or may not have a default value (parameters without default values, such as the $root_password in this sample, must be set explicitly while declaring the class). The declaration of a parameterized class has exactly the same syntax as that of a normal resource:

class { 'mysql':
  $root_password => 's3cr3t',
}

Puppet 3.0 introduced a feature called data binding; if we don't pass a value for a given parameter, as in the above example, before using the default value if present, Puppet performs an automatic lookup to a Hiera variable with the name $class::$parameter. In this example, it would be mysql::root_password.

This is an important feature that radically changes the approach on how to manage data in Puppet architectures. We will come back to this topic in the following chapters.

Besides classes, Puppet also has defines, which can be considered as classes that can be used multiple times on the same host (with a different title). Defines are also called defined types, since they are types that can be defined using Puppet DSL, contrary to the native types that are written in Ruby.

They have a similar syntax:

define mysql::user (
  $password,                # Mandatory parameter, no defaults set
  $host      = 'localhost', # Parameter with a default value
  [...]
 ) {
  # Here all the resources
}

They are also used in a similar way:

mysql::user { 'al':
  $password => 'secret',
}

Note that defines (also called user-defined types, defined resource types, or definitions), like the one above, even if written in Puppet DSL, have exactly the same usage pattern of native types, that are written in Ruby (packages, services, files, and so on).

In types, besides the parameters that are explicitly exposed, there are two variables that are automatically set: $title is the defined title, and $name, which defaults to the value of $title, can be set to an alternate value.

Since a define can be declared more than once inside a catalog (with different titles), it's important to avoid to declare, inside a define, resources with a static title. For example, this is wrong:

define mysql::user ( ...) {
  exec { 'create_mysql_user':
    [ … ]
  }
}

This is because when there are two different mysql::user declarations, it will generate an error like the following:

Duplicate definition: Exec[create_mysql_user] is already defined in file /etc/puppet/modules/mysql/manifests/user.pp at line 2; cannot redefine at /etc/puppet/modules/mysql/manifests/user.pp:2 on node test.example42.com 

A correct version could use the $title variable, which is inherently different each time:

define mysql::user ( ...) {
  exec { "create_mysql_user_${title}":
    [ … ]
  }
}

Class inheritance

We have seen that in Puppet, classes are just containers of resources and have nothing to do with Object-oriented Programming classes; so the definition of class inheritance is somehow limited to a few specific cases.

When using class inheritance, the main class (puppet in the following sample) is always evaluated first, and all the variables and resource defaults that it sets are available in the scope of the child class (puppet::server).

Moreover, the child class can override the arguments of a resource defined in the parent class:

class puppet {
  file { '/etc/puppet/puppet.conf':
    content => template('puppet/client/puppet.conf'),
  }
}
class puppet::server inherits puppet {
  File['/etc/puppet/puppet.conf'] {
    content => template('puppet/server/puppet.conf'),
  }
}

Note the syntax used when declaring a resource; we use a syntax like file { '/etc/puppet/puppet.conf': [...] }. When referring to it, the syntax is File['/etc/puppet/puppet.conf'].

Resource defaults

It is possible to set the default argument values for a resource type in order to reduce code duplication. The general syntax to define a resource default is as follows:

Type {
  argument => default_value,
}

Common examples are as follows:

Exec {
  path => '/sbin:/bin:/usr/sbin:/usr/bin',
}
File {
  mode  => 0644,
  owner => 'root',
  group => 'root',
}

Resource defaults can be overridden when declaring a specific resource of the same type.

It is worth noting that the area of effect of the resource defaults might bring unexpected results. The general suggestion is as follows:

  • Place global resource defaults in /etc/puppet/manifests/site.pp outside any node definition.

  • Place local resource defaults at the beginning of a class that uses them (mostly for clarity sake, as they are independent of the parse-order).

We cannot expect a resource default that is defined in a class to be working in another class, unless it is a child class with an inheritance relationship.

Resource references

In Puppet, any resource is uniquely identified by its type and its name. We cannot have two resources of the same type with the same name in a node's catalog.

We have seen that we declare resources with a syntax like the following one:

type { 'name':
  arguments => values,
}

When we need to reference them (typically when we define dependencies between resources) in our code, the following is the syntax (note the square brackets and the capital letter):

Type['name']

Some examples are as follows:

file { 'motd': ... }
apache::virtualhost { 'example42.com': .... }
exec { 'download_myapp': .... }

These examples are referenced, respectively, with the following code:

File['motd']
Apache::Virtualhost['example42.com']
Exec['download_myapp']

Variables, facts, and scopes

When writing our manifests, we can set and use variables; they help us in organizing which resources we want to apply, how they are parameterized, and how they change according to our logic, infrastructure, and our needs.

They may have different sources:

  • Facter (variables, called facts, automatically generated on the Puppet client)

  • User-defined variables in Puppet code (variables that are defined using Puppet DSL)

  • User-defined variables from an ENC

  • User-defined variables on Hiera

  • Puppet's built-in variables

System's facts

When we install Puppet on a system, the facter package is installed as a dependency. Facter is executed on the client each time Puppet is run, and it collects a large set of key-value pairs that reflect many properties of the system. They are called facts and provide valuable information such as the system's operatingsystem, operatingsystemrelease, osfamily, ipaddress, hostname, fqdn, and macaddress to name just some of the most used ones.

All the facts gathered on the client are available as variables to the Puppet Master and can be used inside manifests to provide a catalog that fits the client.

We can see all the facts of our nodes running locally:

facter -p

(The -p argument is the short version of --puppet and also shows eventual custom facts that are added to the native ones, via our modules.)

User variables in Puppet DSL

Variable definition inside the Puppet DSL follows the general syntax: $variable = value

Let's see some examples. Here, the value is set as string, boolean, array, or hash as shown in the following code:

$redis_package_name = 'redis'
$install_java = true
$dns_servers = [ '8.8.8.8' , '8.8.4.4' ]
$config_hash = { user => 'joe', group => 'admin' }

Here, the value is the result of a function call (which may have values, as arguments, strings, other data types, or other variables):

$config_file_content = template('motd/motd.erb')

$dns_servers = hiera(name_servers)
$dns_servers_count = inline_template('<%= @dns_servers.length %>')

Here, the value is determined according to the value of another variable (here, the $::osfamily fact is used), using the selector construct:

$mysql_service_name = $::osfamily ? {
  'RedHat' => 'mysqld',
  default  => 'mysql', 
}

A special value for a variable is undef (similar to Ruby's nil), which basically removes any value to the variable. This can be useful in resources when we want to disable (and make Puppet ignore) an existing attribute:

$config_file_source = undef
file { '/etc/motd':
  source  => $config_file_source,
  content => $config_file_content,
}

Note that we can't change the value assigned to a variable inside the same class (more precisely, inside the same scope; we will review them later).

$counter = '1'
$counter = $counter + 1

The preceding code will produce the following error:

Cannot reassign variable counter

User variables in an ENC

When an ENC is used for classifying nodes, it returns the classes to include in the requested node and variables. All the variables provided by an ENC are at the top scope (we can reference them with $::variablename all over our manifests).

User variables in Hiera

Hiera is another very popular and useful place to place user data (yes, variables); we will review it extensively in Chapter 2, Hiera; here, let's just point out a few basic usage patterns. We can use it to manage any kind of variable whose value can change according to custom logic in a hierarchical way. Inside manifests, we can look up a Hiera variable using the hiera() function. Some examples are as follows:

$dns = hiera(dnsservers)
class { 'resolver':
  dns_server => $dns,
}

The previous code can also be written as:

class { 'resolver':
  dns_server => hiera(dnsservers),
}

In our Hiera YAML files, we would have something like the following:

dnsservers:
  - 8.8.8.8
  - 8.8.4.4

If our Puppet Master uses Puppet Version 3 or greater, then we can benefit from the Hiera automatic lookup for class parameters, which is the ability to define in Hiera values for any parameter exposed by the class. The above example would become something like the following:

include resolver

and then, in Hiera YAML files:

resolver::dns_server:
  - 8.8.8.8
  - 8.8.4.4

Puppet's built-in variables

A bunch of other variables is available and can be used in manifests or templates:

Variables set by the client (agent):

  • $clientcert: This is the name of the node (the certname setting in its puppet.conf, by default, is the host's FQDN)

  • $clientversion: This is the Puppet version of the agent

Variables set by the server (Master):

  • $environment: This is a very important special variable, which defines the Puppet's environment of a node (for different environments, the Puppet Master can serve manifests and modules from different paths)

  • $servername, $serverip: Respectively the Master's FQDN and IP address.

  • $serverversion: The Puppet version on the Master (is always better to have Masters with Puppet version equal or newer than the clients)

  • $settings::<setting_name>: Any configuration setting of the Puppet Master's puppet.conf

Variables set by the parser during catalog compilation:

  • $module_name: This is the name of the module that contains the current resource's definition

  • $caller_module_name: This is the name of the module that contains the current resource's declaration

A variable's scope

One of the parts where Puppet development can be misleading and not so intuitive is how variables are evaluated according to the place in the code where they are used.

Variables have to be declared before they can be used, and this is dependent on the parse-order; so, also for this reason, Puppet language can't be considered completely declarative.

In Puppet, there are different scopes, which are partially isolated areas of code where variables and resource default values can be confined and accessed.

There are four types of scopes, from general to local there are:

  • Top scope: Any code defined outside nodes and classes, as what is generally placed in /etc/puppet/manifests/site.pp

  • Node scope: Code defined inside the node's definitions

  • Class scope: Code defined inside a class or define

  • Sub class scope: Code defined in a class that inherits another class

We always write code within a scope, and we can directly access variables (that is, by just specifying their name without using the fully qualified name) defined only in the same scope or in a parent or containing one. So:

  • Top scope variables can be accessed from anywhere

  • Node scope variables can be accessed in classes (used by the node) but not at the Top scope

  • Class (also called local) variables are directly available, with their plain name, only from within the same class, or define where they are set or in a child class

The variables' value or resources default arguments that are defined at a more general level can be overridden at a local level (Puppet always uses the most local value).

It's possible to refer to variables outside a scope by specifying their fully qualified name, which contains the name of the class where the variables is defined, for example, $::apache::config_dir is a variable called config_dir, and is defined in the apache class.

One important change introduced in Puppet 3.x is the forcing of static scoping for variables; this indicates that the parent scope for a class can only be its parent class.

Earlier, Puppet versions had dynamic scoping, where parent scopes were assigned both by inheritance (like in static scoping) and by simple declaration; that is, any class has as a parent the first scope where it has been declared. This means that since we can include classes multiple times, the order used by Puppet to parse our manifests may change the parent scope, and therefore, how a variable is evaluated.

This can obviously lead to any kind of unexpected problems if we are not particularly careful about how classes are declared, with variables evaluated in different parse-order dependent ways. The solution is Puppet 3's static scoping and the need to reference to out-of-scope variables with their fully qualified name.

Meta parameters

Meta parameters are general-purpose parameters available to any resource type even if not explicitly defined. They can be used for different purposes:

  • Manage the ordering of dependencies and resources (more on them in the next section): before, require, subscribe, notify, stage

  • Manage resources' application policies: audit (audit the changes done on the attributes of a resource), noop (do not apply any real change for a resource), schedule (apply the resources only within a given time schedule), and loglevel (manage the log verbosity)

  • Add information to a resource using alias (adds an alias that can be used to reference a resource) and tag (adds a tag that can be used to refer to a group resources according to custom needs; we will see a use case later in this chapter in the external resources section)

Managing order and dependencies

Puppet language is declarative and not procedural (*); it defines states. The order in which resources are written in manifests does not affect the order in which they are applied to the desired state.

Note

(*) This is not entirely true; contrary to resources, variables definitions are parse-order dependent, so the order is important when it is used to define variables. As a general rule, just set variables before using them, which sounds logical but is actually procedural.

There are cases where we need to set some kind of ordering among resources, for example, we want to manage a configuration file only after the relevant package has been installed, or have a service automatically restart when its configuration files changes.

Also, we may want to install packages only after we've configured our packaging systems (apt sources, yum repos, and so on), or install our application only after the whole system and the middleware has been configured.

To manage these cases, there are three different methods, which can coexist, as follows:

  1. Use the meta parameters before, require, notify, and subscribe

  2. Use the chaining arrows operator (respective to the meta parameters: ->, <-, <~, and ~>)

  3. Use run stages

In a typical package/service/configuration file example, we want the package to be installed first. Then, configure it and start the service, and eventually manage its restart if the configuration file changes.

This can be expressed with meta parameters:

package { 'exim':
  before => File['exim.conf'],  
}
file { 'exim.conf':
  notify => Service['exim'],
}
service { 'exim': }

This is equivalent to the following chaining arrows syntax:

package {'exim': } ->
file {'exim.conf': } ~>
service{'exim': }

However, the same ordering can be expressed using the alternate reverse meta parameters:

package { 'exim': }
file { 'exim.conf':
  require => Package['exim'],
}
service { 'exim':
  subscribe => File['exim.conf'], 
}

They can also be expressed as follows:

service{'exim': } <~
file{'exim.conf': } <-
package{'exim': }

Run stages

Puppet 2.6 introduced the concept of run stages to help users manage the order of dependencies when applying groups of resources.

Puppet provides a default main stage; we can add any number of stages and manage their ordering with the stage resource type using the normal syntax for resources declaration as we have seen previously:

stage { 'pre':
  before => Stage['main'],
}

This is equivalent to:

stage { 'pre': }
Stage['pre'] -> Stage['main']

We can assign any class to a defined stage with the stage meta parameter:

class { 'yum':
  stage => 'pre',
}

In this way, all the resources provided by the yum class , which is included in pre-stage are applied before all the other resources (in the default main stage).

The idea of stages at the beginning seemed a good solution to better handle large sets of dependencies in Puppet. In reality, some drawbacks and the augmented risk of having dependency cycles make them less useful than expected.

As a rule of thumb, it is recommended to use them for simple classes (that don't include other classes) and where really necessary (for example, to set up package management configurations at the beginning of a Puppet run, or deploy our application after all the other resources have been managed).

Reserved names and allowed characters

As with every language, Puppet DSL has some restrictions on the names we can give to its elements and the allowed characters.

As a general rule, for names of resources, variables, parameters, classes, and modules, we can use only lowercase letters, numbers, and the underscore (_). Usage of hyphens (-) should be avoided (in some cases, it is forbidden; in others, it depends on Puppet's version).

We can use uppercase letters in variable names (but not at their beginning), and use any character for resources' titles.

Names are case-sensitive, and there are some reserved words that cannot be used as names for resources, classes or defines, or as unquoted word strings in the code:

and, case, class, default, define, else, elsif, false, if, in, import, inherits, node, or, true, undef, unless, main, settings, $string.

Conditionals

Puppet provides different constructs to manage conditionals inside manifests.

Selectors, as we have seen, let us set the value of a variable or an argument inside a resource declaration according to the value of another variable. Selectors, therefore, just return values and are not used to conditionally manage entire blocks of code.

Here's an example of a selector:

$package_name = $::osfamily ? {
  'RedHat' => 'httpd',
  'Debian' => 'apache2',
  default  => undef,
}

A case statement is used to execute different blocks of code according to the values of a variable. It's recommended to have a default block for unmatched entries. Case statements can't be used inside resource declarations. We can achieve the same result of the previous selector with this case sample:

case $::osfamily {
  'Debian': { $package_name = 'apache2' }
  'RedHat': { $package_name = 'httpd' }
  default: { fail ("OS $::operatingsystem not supported") } 
}

The if, elsif, and else conditionals, like case, are used to execute different blocks of code, and can't be used inside resources' declarations. We can use any of Puppet's comparison expressions, and we can combine more than one for complex pattern matching.

The previous sample variables assignment can also be expressed in this way:

if $::osfamily == 'Debian' {
  $package_name = 'apache2'
} elsif $::osfamily == 'RedHat' {
  $package_name = 'httpd'
} else {
  fail ("OS $::operatingsystem not supported")
}

An unless statement is the opposite of if. It evaluates a Boolean condition, and if it's false, it executes a block of code.

Comparison operators

Puppet supports comparison operators that resolve to true or false. They are as follows:

  • Equal ==, returns true if the operands are equal. Used with numbers, strings, arrays, hashes, and Booleans, as shown in the following example:

    if $::osfamily == 'Debian' { [ ... ] }
  • Not equal != , returns true if the operands are different:

    if $::kernel != 'Linux' { [ ... ] }
  • Less than <, greater than >, less than or equal to <= and greater than or equal to >= can be used to compare numbers:

    if $::uptime_days > 365 { [ ... ] }
    if $::operatingsystemrelease <= 6 { [ ... ] }
  • Regex match =~ compares a string (the left operator) with a regular expression (the right operator). Resolves true, if it matches. Regular expressions are enclosed between forward slashes and follow the normal Ruby syntax:

    if $mode =~ /(server|client)/ { [ ... ] }
    if $::ipaddress =~ /^10\./ { [ ... ] }
  • Regex not match !~ , opposite to =~, resolves false if the operands match.

The In operator

The in operator checks if a string is present in another string, an array, or in the keys of a hash; it is case-sensitive:

if '64' in $::architecture
if $monitor_tool in [ 'nagios' , 'icinga' , 'sensu' ]

Expressions combinations

It's possible to combine multiple comparisons with and and or as shown in the following code:

if ($::osfamily == 'RedHat') and ($::operatingsystemrelease == '5') { [ ... ] }
if (operatingsystem == 'Ubuntu') or ($::operatingsystem == 'Mint') { [ ...] }

Exported resources

When we need to provide a host with information about the resources present in another host, things in Puppet become trickier. The only official solution has been, for a long time, to use exported resources; resources are declared in the catalog of a node (based on its facts and variables) but applied (collected) on another node. Some alternative approaches are now possible with PuppetDB; we will review them in Chapter 3, PuppetDB.

Resources are declared with the special @@ notation, which marks them as exported so that they are not applied to the node where they are declared:

@@host { $::fqdn:
  ip  => $::ipaddress,
}
@@concat::fragment { "balance-fe-${::hostname}":
  target  => '/etc/haproxy/haproxy.cfg',
  content => "server ${::hostname} ${::ipaddress} maxconn 5000",
  tag     => "balance-fe",
}

Once a catalog that contains exported resources has been applied on a node and stored by the Puppet Master, the exported resources can be collected with the <<| |>> operator, where it is possible to specify search queries:

Host <<| |>>
Concat::Fragment <<| tag == "balance-fe" |>>
Sshkey <<| |>>
Nagios_service <<| |>>

In order to use exported resources, we need to enable on the Puppet Master the storeconfigs option and specify the backend to use. For a long time, the only available backend was Rails' active records, which typically used MySQL for data persistence. This solution was the best for its time but suffered severe scaling limitations. Luckily, things have changed a lot with the introduction of PuppetDB, which is a fast and reliable storage solution for all the data generated by Puppet, including exported resources.

In order to configure a Puppet Master to enable storeconfigs with PuppetDB, we have to add these lines in the [master] section of puppet.conf (more on this in a later chapter):

storeconfigs = true
storeconfigs_backend = puppetdb

If we want to use the old ActiveRecord backend, with a SQLite backend (which is useful to test exported resources without the need to install any other component, but definitively not applicable in production environments), the configuration is (we need to have installed the sqlite packages and ruby bindings) shown in the following code:

storeconfigs = true
dbadapter = sqlite3

To use ActiveRecords with a MySQL backend, we need these configurations:

storeconfigs = true
dbadapter = mysql
dbuser = puppet
dbpassword = secretpassword
dbserver = localhost
dbsocket = /var/run/mysqld/mysqld.sock # If server is local

Obviously, we will need to grant the relevant credentials on MySQL:

# mysql -u root -p
mysql> create database puppet;
mysql> grant all privileges on puppet.* to [email protected] identified by 'secretpassword';

Virtual resources

Virtual resources define a desired state for a resource without adding it to the catalog. Like normal resources, they are applied only on the node where they are declared, but like virtual resources, we can apply only a subset of the ones we have declared; they also have a similar usage syntax; we declare them with a single @ prefix (instead of the @@ prefix used for exported resources), and we collect them with <| |> (instead of <<| |>>).

A useful and rather typical example involves user management.

We can declare all our users in a single class, included by all our nodes:

class my_users {
  @user { 'al': […] tag => 'admins' }
  @user { 'matt': […] tag => 'developers' }
  @user { 'joe': [… tag => 'admins' }
[ … ]
}

These users are actually not created on the system; we can decide which ones we want on a specific node with a syntax like the following:

User <| tag == admins |>

This is equivalent to:

realize(User['al'] , User['joe'])

Note that the realize function needs to address resources by their name.

Modules

Modules are self-contained, distributable, and (ideally) reusable recipes to manage specific applications or system's elements.

They are basically just a directory with a predefined and standard structure that enforces configuration over naming conventions for the managed provided classes, extensions, and files.

The $modulepath configuration entry defines where modules are searched; this can be a list of colon-separated directories.

The paths of a module and autoloading

Modules have a standard structure, for example, for a MySQL module:

mysql/            # Main module directory

mysql/manifests/  # Manifests directory. Puppet code here.
mysql/lib/        # Plugins directory. Ruby code here
mysql/templates/  # ERB Templates directory
mysql/files/      # Static files directory
mysql/spec/       # Puppet-rspec test directory
mysql/tests/      # Tests / Usage examples directory

mysql/Modulefile  # Module's metadata descriptor

This layout enables useful conventions that are widely used in the Puppet world; we must know these to understand where to look for files and classes.

For example, when we use modules and write the code:

include mysql

Puppet automatically looks for a class called mysql defined in the file $modulepath/mysql/manifests/init.pp.

The init.pp script is a special case that applies for classes that have the same name of the module. For subclasses there's a similar convention that takes in consideration the subclass name:

include mysql::server

It autoloads the file $modulepath/mysql/manifests/server.pp.

A similar scheme is followed also for defines or classes at lower levels:

mysql::conf { ...}

This define is searched in $modulepath/mysql/manifests/conf.pp

include mysql::server::ha

This class is searched in $modulepath/mysql/manifests/server/ha.pp.

It's generally recommended to follow these naming conventions that allow the autoloading of classes and defines without the need to explicitly import the manifests that contain them.

Note

Even if this is not considered a good practice, we can currently define more than one class or define inside the same manifest; when Puppet parses a manifest, it parses its whole contents.

A module's naming conventions apply also to the files that Puppet provides to clients.

We have seen that the file resource accepts two different and alternative arguments to manage the content of a file: source and content. Both of them have a naming convention when used inside a module.

ERB templates are typically parsed via the template function with a syntax like the following:

content => template('mysql/my.cnf.erb'),

This template is found in $modulepath/mysql/templates/my.cnf.erb.

This also applies for subdirectories, so for example:

content => template('apache/vhost/vhost.conf.erb'),

uses a template located in $modulepath/apache/templates/vhost/vhost.conf.erb.

A similar approach is followed with static files provided via the source argument:

source => 'puppet:///modules/mysql/my.cnf'

serves a file placed in $modulepath/mysql/files/my.cnf.

source => 'puppet:///modules/site/openssh/sshd_config'

serves a file placed in $modulepath/site/openssh/sshd_config

Finally, the whole content of the lib subdirectory in a module has a standard scheme. Here, we can place Ruby code that extends Puppet's functionality and is automatically redistributed from the Master to all clients (if the pluginsync configuration parameter is set to true, this is the default value for Puppet 3 and is widely recommended in any setup).

mysql/lib/augeas/lenses/                # Custom Augeas lenses.
mysql/lib/facter/                       # Custom facts.
mysql/lib/puppet/type/                  # Custom types.
mysql/lib/puppet/provider/<type_name>/  # Custom providers.
mysql/lib/puppet/parser/functions/      # Custom functions.

ERB templates

Files provisioned by Puppet can be templates written in Ruby's ERB templating language.

An ERB template can contain whatever text we need, and have inside <% %> tags an interpolation of variables or Ruby code. We can access in a template, all the Puppet variables (facts or user-assigned) with the <%= tag:

# File managed by Puppet on <%= @fqdn %>
search <%= @domain %>

It is recommended, and will be mandatory in future Puppet versions to refer to variables in a scope using the @ prefix).

To use out of scope variables, we can use the scope.lookupvar method:

path <%= scope.lookupvar('apache::vhost_dir') %>

This uses the variable's fully qualified name. If the variable is at top scope:

path <%= scope.lookupvar('::fqdn') %>

Since Puppet 3, we can use this alternate syntax:

path <%= scope['apache::vhost_dir'] %>

In ERB templates, we can also use more elaborate Ruby code inside a <% opening tag, for example, to reiterate over an array:

<% @dns_servers.each do |ns| %>
nameserver <%= ns %>
<% end %>

The <% tag is used to place line of text if some conditions are met:

<% if scope.lookupvar('puppet::db') == "puppetdb" -%>
  storeconfigs_backend = puppetdb
<% end -%>

Noticed the -%> ending tag here? When the dash is present, no line is introduced on the generated file as it would happen if we had written <% end %>.

Restoring files from a filebucket

Puppet, by default, makes a local copy of all the files that it changes on a system. This functionality is managed with the filebucket type, which allows storing a copy of the original files either on a central server or locally on the managed system.

When we run Puppet, we see messages like:

info: /Stage[main]/Ntp/File[ntp.conf]: Filebucketed /etc/ntp.conf to puppet with sum 7fda24f62b1c7ae951db0f746dc6e0cc

The checksum of the original file is useful to retrieve it; in fact, files are saved in the directory /var/lib/puppet/clientbucket in a series of subdirectories named according to the same checksum. So, given the above example, we can see the original file content with the command:

cat /var/lib/puppet/clientbucket/7/f/d/a/2/4/f/6/7fda24f62b1c7ae951db0f746dc6e0cc/contents

We can show the original path with the command:

cat /var/lib/puppet/clientbucket/7/f/d/a/2/4/f/6/7fda24f62b1c7ae951db0f746dc6e0cc/paths

A quick way to search for the saved copies of a file, therefore, is to use a command like the following:

grep -R /etc/ntp.conf /var/lib/puppet/clientbucket/

Puppet provides the filebucket subcommand to retrieve saved files. In the above example, we can recover the original file with a (not particularly handy) command, as follows:

puppet filebucket restore -l --bucket /var/lib/puppet/clientbucket /etc/ntp.conf 7fda24f62b1c7ae951db0f746dc6e0cc

It's possible to configure a remote filebucket, typically on the Puppet Master, using the special filebucket type:

filebucket { 'central':
  path   => false,    # This is required for remote filebuckets.
  server => 'my.s.com', # Optional, by default is the puppetmaster
}

Once filebucket is declared, we can assign it to a file with the backup argument:

file { '/etc/ntp.conf':
  backup => 'central',
}

This is generally done using a resource default defined at top scope (typically in our /etc/puppet/manifests/site.pp ):

File { backup => 'central', }
 

Summary


In this chapter, we have reviewed and summarized the basic Puppet principles that are a prerequisite to better understand the contents of the book. We have seen how Puppet is configured and what its main components are: manifests, resources, nodes, and classes, and the power of the Resource Abstraction Layer.

The most useful language elements have been described: variables, references, resources defaults and ordering, conditionals, and comparison operators. We have taken a look at exported and virtual resources and analyzed the structure of a module. We also learned how to work with ERB templates. Finally, we have seen how Puppet's filebucket works and how to recover files modified by Puppet.

We are now ready to face a very important component of the Puppet ecosystem: Hiera, and see how it can be used to separate our data from Puppet code.

About the Author

  • Alessandro Franceschi

    Alessandro is a long time Puppet user, trainer, and consultant.

    He started using Puppet in 2007, automating a remarkable amount of customers' infrastructures of different sizes, natures, and complexities.

    He has attended several PuppetConf and PuppetCamps as both speaker and participant, always enjoying the vibrant and friendly community, each time learning something new.

    Over the years, he started to publish his Puppet code, trying to make it reusable in different scenarios. The result of this work is the example42 Puppet modules and control repo, complete, feature rich, sample Puppet environment. You can read about example42 at www.example42.com.

    You can follow Franceschi on his Twitter account at @alvagante.

    Browse publications by this author