Designing Puppet Architectures

(For more resources related to this topic, see here.)

Puppet is an extensible automation framework, a tool, and a language. We can do great things with it, and we can do them in many different ways. Besides the technicalities of learning the basics of its DSL, one of the biggest challenges for new and not-so-new users of Puppet is to organize code and put things together in a manageable and appropriate way.

It's hard to find a comprehensive documentation on how to use public code (modules) with our custom modules and data, where to place our logic, how to maintain and scale it, and generally, how to manage the resources that we want in our nodes and the data that defines them safely and effectively.

There's not really a single answer that fits all these cases. There are best practices, recommendations, and many debates in the community, but ultimately, it all depends on our own needs and infrastructure, which vary according to multiple factors, such as the following:

  • The number and variety of nodes and application stacks to manage

  • The infrastructure design and number of data centers or separate networks to manage

  • The number and skills of people who work with Puppet

  • The number of teams who work with Puppet

  • Puppet's presence and integration with other tools

  • Policies for change in production

In this article, we will outline the elements needed to design a Puppet architecture, reviewing the following elements in particular:

  • The tasks to deal with (manage nodes, data, code, files, and so on) and the available components to manage them

  • Foreman, which is probably the most used ENC around, with Puppet Enterprise

  • The pattern of roles and profiles

  • Data separation challenges and issues

  • How the various components can be used together in different ways with some sample setups

The components of Puppet architecture

With Puppet, we manage our systems via the catalog that the Puppet Master compiles for each node. This is the total of the resources we have declared in our code, based on the parameters and variables whose values reflect our logic and needs.

Most of the time, we also provide configuration files either as static files or via ERB templates, populated according to the variables we have set.

We can identify the following major tasks when we have to manage what we want to configure on our nodes:

  • Definition of the classes to be included in each node

  • Definition of the parameters to use for each node

  • Definition of the configuration files provided to the nodes

These tasks can be provided by different, partly interchangeable components, which are as follows:

  • site.pp is the first file parsed by the Puppet Master (by default, its path is /etc/puppet/manifests/site.pp) and eventually, all the files that are imported from there (import nodes/*.pp would import and parse all the code defined in the files with the .pp suffix in the /etc/puppet/manifests/nodes/ directory). Here, we have code in the Puppet language.

  • An ENC (External Node Classifier) is an alternative source that can be used to define classes and parameters to apply to nodes. It's enabled with the following lines on the Puppet Master's puppet.conf:

    [master] node_terminus = exec external_nodes = /etc/puppet/node.rb

    What's referred by the external_nodes parameter can be any script that uses any backend; it's invoked with the client's certname as the first argument (/etc/puppet/node.rb and should return a YAML formatted output that defines the classes to include for that node, the parameters, and the Puppet environment to use.

    Besides the well-known Puppet-specific ENCs such as The Foreman and Puppet Dashboard (a former Puppet Labs project now maintained by the community members), it's not uncommon to write new custom ones that leverage on existing tools and infrastructure-management solutions.

  • LDAP can be used to store nodes' information (classes, environment, and variables) as an alternative to the usage of an ENC. To enable LDAP integration, add the following lines to the Master's puppet.conf:

    [master] node_terminus = ldap ldapserver = ldapbase = ou=Hosts,dc=example,dc=com

    Then, we have to add Puppet's schema to our LDAP server. For more information and details, refer to

  • Hiera is the hierarchical key-value datastore. It is is embedded in Puppet 3 and available as an add-on for previous versions. Here, we can set parameters but also include classes and eventually provide content for files.

  • Public modules can be retrieved from Puppet Forge, GitHub, or other sources; they typically manage applications and systems' settings. Being public, they might not fit all our custom needs, but they are supposed to be reusable, support different OSes, and adapt to different usage cases. We are supposed to be able to use them without any modification, as if they were public libraries, committing our fixes and enhancements back to the upstream repository. A common but less-recommended alternative is to fork a public module and adapt it to our needs. This might seem a quicker solution, but doesn't definitively help the open source ecosystem and would prevent us from having benefits from updates on the original repository.

  • Site module(s) are custom modules with local resources and files where we can place all the logic we need or the resources we can't manage with public modules. They may be one or more and may be called site or have the name of our company, customer, or project. Site modules have particular sense as a companion to public modules when they are used without local modifications. On site modules, we can place local settings, files, custom logic, and resources.

The distinction between public reusable modules and site modules is purely formal; they are both Puppet modules with a standard structure. It might make sense to place the ones we develop internally in a dedicated directory (module paths), which is different from the one where we place shared modules downloaded from public sources.

Let's see how these components might fit our Puppet tasks.

Defining the classes to include in each node

This is typically done when we talk about node classification in Puppet. This is the task that the Puppet Master accomplishes when it receives a request from a client node and has to determine the classes and parameters to use for that specific node.

Node classification can be done in the following different ways:

  • We can use the node declaration in site.pp and other manifests eventually imported from there. In this way, we identify each node by certname and declare all the resources and classes we want for it, as shown in the following code:

    node '' { include ::general include ::apache }

    Here, we may even decide to follow a nodeless layout, where we don't use the node declaration at all and rely on facts to manage the classes and parameters to be assigned to our nodes. An example of this approach is examined later in this article.

  • On an ENC, we can define the classes (and parameters) that each node should have. The returned YAML for our simple case would be something like the following lines of code:

    --- classes: - general: - apache: parameters: dns_servers: - - smtp_server: environment: production

  • Via LDAP, where we can have a hierarchical structure where a node can inherit the classes (referenced with the puppetClass attribute) set in a parent node (parentNode).

  • Via Hiera, using the hiera_include function just add in site.pp as follows:


    Then, define our hierarchy under the key named classes, what to include for each node. For example, with a YAML backend, our case would be represented with the following lines of code:

    --- classes: - general - apache

  • In site module(s), any custom logic can be placed as, for example, the classes and resources to include for all the nodes or for specific groups of nodes.

Defining the parameters to use for each node

This is another crucial part, as with parameters, we can characterize our nodes and define the resources we want for them.

Generally, to identify and characterize a node in order to differentiate it from the others and provide the specific resources we want for it, we need very few key parameters, such as the following (the names used here may be common but are arbitrary and are not Puppet's internal ones):

  • role is almost a standard de facto name to identify the kind of server. A node is supposed to have just one role, which might be something like webserver, app_be, db, or anything that identifies the function of the node. Note that web servers that serve different web applications should have different roles (that is, webserver_site, webserver_blog, and so on). We can have one or more nodes with the same role.

  • env or any name that identifies the operational environment of the node (if it is a development, test, qa, or production server).

    Note that this doesn't necessarily match Puppet's internal environment variable. Someone prefers to merge the env information inside role, having roles such as webserver_prod and webserver_devel.

  • Zone, site, data center, country, or any parameter that might identify the network, country, availability zone, or datacenter where the node is placed. A node is supposed to belong to only one of this. We might not require this in our infrastructure.

  • Tenant, component, application, project, and cluster might be the other kind of variables that characterize our node. There's not a real standard on their naming, and their usage and necessity strictly depend on the underlying infrastructure.

With parameters such as these, any node can be fully identified and be served with any specific configuration. It makes sense to provide them, where possible, as facts.

The parameters we use in our manifests may have a different nature:

  • role/env/zone as defined earlier are used to identify the nodes; they typically are used to determine the values of other parameters

  • OS-related parameters such as package names and file paths

  • Parameters that define the services of our infrastructure (DNS servers, NTP servers, and so on)

  • Username and passwords, which should be reserved, used to manage credentials

  • Parameters that express any further custom logic and classifying need (master, slave, host_number, and so on)

  • Parameters exposed by the parameterized classes or defines we use

Often, the value of some parameters depend on the value of other ones. For example, the DNS or NTP server may change according to the zone or region on a node. When we start to design our Puppet architecture, it's important to have a general idea of the variations involved and the possible exceptions, as we will probably define our logic according to them. As a general rule, we will use the identifying parameters (role/env/zone) to define most of the other parameters most of the time, so we'll probably need to use them in our Hiera hierarchy or in Puppet selectors. This also means that we probably will need to set them as top scope variables (for example, via an ENC) or facts.

As with the classes that have to be included, parameters may be set by various components; some of them are actually the same, as in Puppet, a node's classification involves both classes to include and parameters to apply. These components are:

  • In site.pp, we can set variables. If they are outside nodes' definitions, they are at top scope; if they are inside, they are at node scope. Top scope variables should be referenced with a :: prefix, for example, $::role. Node scope variables are available inside the node's classes with their plain name, for example, $role.

  • An ENC returns parameters, treated as top scope variables, alongside classes, and the logic of how they can be set depends entirely on its structure. Popular ENCs such as The Foreman, Puppet Dashboard, and the Puppet Enterprise Console allow users to set variables for single nodes or for groups, often in a hierarchical fashion. The kind and amount of parameters set here depend on how much information we want to manage on the ENC and how much to manage somewhere else.

  • LDAP, when used as a node's classifier, returns variables for each node as defined with the puppetVar attribute. They are all set at top scope.

  • In Hiera, we set keys that we can map to Puppet variables with the hiera(), hiera_array() and hiera_hash() functions inside our Puppet code. Puppet 3's data bindings automatically map class' parameters to Hiera keys, so for these cases, we don't have to explicitly use hiera* functions. The defined hierarchy determines how the keys' values change according to the values of other variables. On Hiera, ideally, we should place variables related to our infrastructure and credentials but not OS-related variables (they should stay in modules if we want them to be reusable).

    A lot of documentation about Hiera shows sample hierarchies with facts such as osfamily and operatingsystem. In my very personal opinion, such variables should not stay there (weighting the hierarchy size), as OS differences should be managed in the classes and modules used and not in Hiera.

  • On public shared modules, we typically deal with OS-specific parameters. Modules should be considered as reusable components that know all about how to manage an application on different OS but nothing about custom logic. They should expose parameters and defines that allow users to determine their behavior and fit their own needs.

  • On site module(s), we may place infrastructural parameters, credentials, and any custom logic, more or less based on other variables.

  • Finally, it's possible and generally recommended to create custom facts that identify the node directly from the agent. An example of this approach is a totally facts-driven infrastructure, where all the node-identifying variables, upon which all the other parameters are defined, are set as facts.

Defining the configuration files provided to the nodes

It's almost certain that we will need to manage configuration files with Puppet and that we need to store them somewhere, either as plain static files to serve via Puppet's fileserver functionality using the source argument of the File type or via .erb templates.

While it's possible to configure custom fileserver shares for static files and absolute paths for templates, it's definitively recommended to rely on the modules' autoloading conventions and place such files inside custom or public modules, unless we decide to use Hiera for them.

Configuration files, therefore, are typically placed in:

  • Public modules: These may provide default templates that use variables exposed as parameters by the modules' classes and defines. As users, we don't directly manage the module's template but the variables used inside it. A good and reusable module should allow us to override the default template with a custom one. In this case, our custom template should be placed in a site module. If we've forked a public shared module and maintain a custom version we might be tempted to place there all our custom files and templates. Doing so, we lose in reusability and gain, maybe, in short term usage simplicity.

  • Site module(s): These are, instead, a more correct place for custom files and templates, if we want to maintain a setup based on public shared modules, which are not forked, and custom site ones where all our stuff stays confined in a single or few modules. This allows us to recreate similar setups just by copying and modifying our site modules, as all our logic, files and resources are concentrated there.

  • Hiera: Thanks to the smart hiera-file backend, Hiera can be an interesting alternative place where to store configuration files, both static ones or templates. We can benefit of the hierarchy logic that works for us and can manage any kind of file without touching modules.

  • Custom fileserver mounts can be used to serve any kind of static files from any directory of the Puppet Master. They can be useful if we need to provide via Puppet files generated/managed by third-party scripts or tools. An entry in /etc/puppet/fileserver.conf like:

    [data] path /etc/puppet/static_files allow *

    Allows serving a file like /etc/puppet/static_files/generated/file.txt with the argument:

    source => 'puppet:///data/generated/file.txt',

Defining custom resources and classes

We'll probably need to provide custom resources, which are not declared in the shared modules, to our nodes, because these resources are too specific. We'll probably want to create some grouping classes, for example, to manage the common baseline of resources and classes we want applied to all our nodes.

This is typically a bunch of custom code and logic that we have to place somewhere. The usual locations are as follows:

  • Shared modules: These are forked and modified to including custom resources; as already outlined, this approach doesn't pay in the long term.

  • Site module(s): These are preferred place-to-place custom stuff, included some classes where we can manage common baselines, role classes, and other containers' classes.

  • Hiera, partially, if we are fond of the create_resources function fed by hashes provided in Hiera. In this case, somewhere (in a site or shared module or maybe, even in site.pp), we have to place the create_resources statements.

The Foreman

The Foreman is definitively the biggest open source software product related to Puppet and not directly developed by Puppet Labs.

The project was started by Ohad Levy, who now works at Red Hat and leads its development, supported by a great team of internal employees and community members.

The Foreman can work as a Puppet ENC and reporting tool; it presents an alternative to the Inventory System, and most of all, it can manage the whole lifecycle of the system, from provisioning to configuration and decommissioning.

Some of its features have been quite ahead of their times. For example, the foreman() function made possible for a long time what is done now with the puppetdbquery module.

It allows direct query of all the data gathered by The Foreman: facts, nodes classification, and Puppet-run reports.

Let's look at this example that assigns to the $web_servers variable the list of hosts that belong to the web hostgroup, which have reported successfully in the last hour:

$web_servers = foreman("hosts",
"hostgroup ~ web and status.failed = 0 and last_report < \"1 hour ago\"")

This was possible long before PuppetDB was even conceived.

The Foreman really deserves at least a book by itself, so here, we will just summarize its features and explore how it can fit in a Puppet architecture.

We can decide which components to use:

  • Systems provisioning and life-cycle management

  • Nodes IP addressing and naming

  • The Puppet ENC function based on a complete web interface

  • Management of client certificates on the Puppet Master

  • The Puppet reporting function with a powerful query interface

  • The Facts querying function, equivalent to the Puppet Inventory system

For some of these features, we may need to install Foreman's Smart Proxies on some infrastructural servers. The proxies are registered on the central Foreman server and provide a way to remotely control relevant services (DHCP, PXE, DNS, Puppet Master, and so on).

The Web GUI based on Rails is quite complete and appealing, but it might prove cumbersome when we have to deal with a large number of nodes. For this reason, we can also manage Foreman via the CLI.

The original foreman-cli command has been around for years but is now deprecated for the new hammer ( with the Foreman plugin, which is very versatile and powerful as it allows us to manage, via the command line, most of what we can do on the web interface.

Roles and profiles

In 2012, Craig Dunn wrote a blog post ( that quickly became a point of reference on how to organize Puppet code. He discussed his concept of roles and profiles. The role describes what the server represents, a live web server, a development web server, a mail server, and so on. Each node can have one and only one role. Note that in his post, he manages environments inside roles (two web servers on two different environments have two different roles):

node www1 { include ::role::www::dev } node www2 { include ::role::www::live } node smtp1 { include ::role::mailserver }

Then, he introduces the concept of profiles, which include and manage modules to define a logical technical stack. A role can include one or more profiles:

class role { include profile::base } class role::www inherits role { include ::profile::tomcat }

In environment-related subroles, we can manage the exceptions we need (here, for example, the www::dev role includes both the database and webserver::dev profiles):

class role::www::dev inherits role::www { include ::profile::webserver::dev include ::profile::database } class role::www::live inherits role::www { include ::profile::webserver::live }

Usage of class inheritance here is not mandatory, but it is useful to minimize code duplication.

This model expects modules to be the only components where resources are actually defined and managed; they are supposed to be reusable (we use them without modifying them) and manage only the components they are written for.

In profiles, we can manage resources and the ordering of classes; we can initialize variables and use them as values for arguments in the declared classes, and we can generally benefit from having an extra layer of abstraction:

Class profile::base { include ::networking include ::users } class profile::tomcat { class { '::jdk': } class { '::tomcat': } } class profile::webserver { class { '::httpd': } class { '::php': } class { '::memcache': } }

In profiles subclasses, we can manage exceptions or particular cases:

class profile::webserver::dev inherits profile::webserver { Class['::php'] { loglevel => "debug" } }

This model is quite flexible and has gained a lot of attention and endorsement from Puppet Labs. It's not the only approach that we can follow to organize the resources we need for our nodes in a sane way, but it's the current best practice and a good point of reference, as it formalizes the concept of role and exposes how we can organize and add layers of abstraction between our nodes and the used modules.

The data and the code

Hiera's crusade and possibly main reason to exist is data separation. In practical terms, this means to convert Puppet code like the following one:

$dns_server = $zone ? { 'it' => '', default => '', } class { '::resolver': server => $dns_servers, }

Into something where there's no trace of local settings like:

$dns_server = hiera('dns_server') class { '::resolver': server => $dns_servers, }

With Puppet 3, the preceding code can be even more simplified with just the following line:

include ::resolver

This expects the resolver::server key evaluated as needed in our Hiera data sources.

The advantages of having data (in this case, the IP of the DNS server, whatever is the logic to elaborate it) in a separated place are clear:

  • We can manage and modify data without changing our code

  • Different people can work on data and code

  • Hiera's pluggable backend system dramatically enhances how and where data can be managed, allowing seamless integration with third-party tools and data sources

  • Code layout is simpler and more error proof

  • The lookup hierarchy is configurable

Nevertheless, there are a few little drawbacks or maybe, just the necessary side effects or needed evolutionary steps. They are as follows:

  • What we've learned about Puppet and used to do without Hiera is obsolete

  • We don't see, directly in our code, the values we are using

  • We have two different places where we can look to understand what code does

  • We need to set the variables we use in our hierarchy as top scope variables or facts, or anyway, we need to refer to them with a fixed fully qualified name

  • We might have to refactor a lot of existing code to move our data and logic into Hiera

A personal note: I've been quite a late jumper on the Hiera wagon. While developing modules with the ambition that they can be reusable, I decided I couldn't exclude users who weren't using this additional component. So, until Puppet 3 with Hiera integrated in it became mainstream, I didn't want to force the usage of Hiera in my code.

Now things are different. Puppet 3's data bindings change the whole scene, Hiera is deeply integrated and is here to stay, and so, even if we can happily live without using it, I would definitively recommend its usage in most of the cases.

Sample architectures

We have outlined the main tasks and components we can use to put things together in a Puppet architecture; we have looked at Foreman, Hiera, and the roles and profiles pattern. Now, let's see some real examples based on them.

The default approach

By default, Puppet doesn't use an ENC and lets us classify nodes directly in /etc/puppet/manifests/site.pp (or in files imported from there) with the node statement. So, a very basic setup would have site.pp with a content like the following:

node www01 { # Place here resources to apply to this node in Puppet DSL: # file, package service, mount... } node lb01 { # Resources for this node: file, package service... }

This is all we basically need; no modules with their classes, no Hiera, no ENC. We just need good old plain Puppet code as they teach us in schools, so to speak.

This basic approach, useful just for the first tests, obviously does not scale well and would quickly become a huge mess of duplicated code.

The next step is to use classes that group resources, and if these classes are placed inside modules, we can include them directly without the need to explicitly import the containing files:

node www01 { include ::apache include ::php }

Also, this approach, even if definitively cleaner, will quickly be overwhelmed by redundant code. So, we will probably want to introduce grouping classes that group other classes and resources according to the desired logic.

One common example is a general class, which includes all the modules, classes, and resources we want to apply to all our nodes.

Another example is a role class, which includes all the extra resources needed by a particular kind of node:

node www01 { include ::general include ::role::www }

We can then have other grouping classes to better organize and reuse our resources, such as the profiles we have just discussed.

Note that with the names mentioned earlier, we would need two different local (site) modules: general and role. I personally prefer to place all the local, custom resources in a single module, to be called site, or even better, with the name of the project, customer, or company. Given this, the previous example could be:

node www01 { include ::site include ::site::role::www }

These are only naming matters that have consequences on directories' layout and eventually on permissions management on our SCM, but the principle of grouping resources according to custom logic is the same.

Until now, we have just included classes, and often, the same classes are included by nodes that need different effects from them, for example, slightly different configuration files, or specific resources, or any kind of variation we have in the real world when configuring the same application on different systems.

Here is where we need to use variables and parameters to alter the behavior of a class according to custom needs.

Here is where the complexity begins, because there are various elements to consider, such as:

  • What are the variables that identify our node

  • If they are sufficient to manage all the variations in our nodes

  • Where we want to place our logic that copes with them

  • Where configurations should be provided as plain static files, where it is better to use templates, and where we could just modify single lines inside files

  • How these choices may affect the risk of doing a change that affects unexpected nodes

The most frequent and dangerous mistakes with Puppet are due to people making changes in code (or data) that are supposed to be made for a specific node but affect other nodes as well. Most of the time, this happens when people don't know the structure and logic of the Puppet codebase they are working on well enough. There are no easy rules to prevent such problems; just some general suggestions such as the following:

  • Promote code peer review and communication among the Puppeteers

  • Test code changes on canary nodes

  • Use naming conventions and coherent code organization to maximize the principle of least surprise

  • Embrace code simplicity, readability, and documentation

  • Be wary of the scope and extent of our abstraction layers

We also need classes that actually allow things to be changed via parameters or variables if we want to avoid placing our logic directly inside them.

Patterns on how to manage variables and their effect on the applied resources have changed a lot with the evolution of Puppet and the introduction of new tools and functionalities.

We won't indulge in how things were done in the good old times; in a modern and currently recommended Puppet setup, we expect to have:

  • At least Puppet 3 on the Puppet Master to eventually enjoy data bindings

  • Classes that expose parameters that allow us to manage them

  • Reusable public modules that allow us to adapt them to our use case without modifications

In this case, we can basically follow two different approaches. We can keep on including classes and set the values we want for the parameters we need to modify on Hiera. So, in our example, we could have something as follows in site/manifests/role/www.pp:

class site::role::www { include ::apache include ::php }

On Hiera, we can have a file such as hieradata/env/devel.yaml, where we set parameters like the following:

--- apache::port: 8080

Alternatively, we might use explicit class declarations like:

class site::role::www { $apache_port = $env ? { devel => '8080', default => '80', } class { '::apache': port => $apache_port, } include ::php }

In such a declaration, the data and logic on how to determine it is definitively inside the code.

Basic ENC, logic in the site module, and Hiera backend

ENC and Hiera can be alternative or complementary; this approach gets advantages from both and uses the site module for most of the logic for class inclusion, the configuration files, and the custom resources.

In Hiera, all the class parameters are placed.

In ENC, when it's not possible to set variables via facts, we set the variables that identify our nodes and can be used on our Hiera's hierarchy.

In site.pp or in the same ENC, we include just a single site class, and here, we manage our grouping logic. For example, with general baseline and role classes:

class site { include ::site::general if $::role { include "::site::roles::${::role}" } }

In our role classes, which are included if the $role variable is set on the ENC, we can manage all the role-specific resources, eventually dealing with differences according to the environment, other identifying variables directly in the role class, or using profiles.

Note that in this article, we've always referred to class names with their full name, so a class such as mysql is referred with ::mysql. This is useful to avoid name collisions when, for example, role names may clash with existing modules. If we don't use the leading :: chars, we will have problems, for example, with a class called site::role::mysql, which may mess with the main class, mysql.

The Foreman and Hiera

The Foreman can act as a Puppet ENC; it's probably the most common ENC around, and we can use both Foreman and Hiera in our architecture.

In this case, we should strictly separate responsibilities and scopes even if they might be overlapping; let's review our components and how they might fit in a scenario based on The Foreman, Hiera, and the usual site module(s):

  • Classes to be included in nodes can be done in The Foreman, the site module, or both. It mostly depends on how much logic we want in The Foreman, and how many activities have to be done via its interface and how many are moved into site module(s). We can decide to define roles and profiles in the site module and use The Foreman just to define top scope variables and the inclusion of a single basic class, as in the previous example. Alternatively, we may prefer to use The Foreman's HostGroups to classify and group nodes, moving most of the classes' grouping logic into The Foreman.

  • Variables to be assigned to nodes can be done in both The Foreman and Hiera. It probably makes sense to set only the variables we need to identify nodes (if they are not provided by facts), in The Foreman, and generally, the ones we might need to use in Hiera's hierarchy. All the other variables and the logic on how to derive them should stay in Hiera.

  • Files should stay in our site module, or eventually, in Hiera (with the hiera-file plugin).

The Hiera-based setup

A common scenario involves the usage of Hiera to manage both the classes to include them in nodes and their parameters and eventually few handy resource defaults.

No ENC is used; site.pp just needs the following:


Classes and parameters can be assigned to nodes enjoying the flexibility of our hierarchy, so in a common.yaml we can have:

--- # Common classes on all nodes classes: - puppet - openssh - timezone - resolver # Common Class Parameters timezone::timezone: 'Europe/Rome' resolver::dns_servers: - -

In a specific datasource file such as role/web.yaml, we can add the classes and the parameters we want to apply to that group of nodes:

--- classes: - stack_lamp stack_lamp::apache_include: true stack_lamp::php_include: true stack_lamp::mysql_include: false

The modules we use (here, an example is stack_lamp; however, it could be something like profile::webserver or apache and php) should expose as parameters anything that is needed to configure things as expected.

Configuration files and templates can be placed in a site module.

The Hiera-only setup

The Hiera-only setup is a somewhat extreme approach. Everything is managed by Hiera: the classes to include in nodes, their parameters, and also the configuration files to serve to the clients. Configuration files are delivered via the hiera-file plugin; also, in this scenario, we need modules and relevant classes that expose parameters to manage the content of files.

Secrets, credentials, and sensitive data may be encrypted via hiera-eyaml or hiera-gpg.

We may wonder whether a site module is still needed, as most of its common functions (provide custom files, manage logic, define and manage variables) can be moved into Hiera.

The answer is probably yes: even in a similar, strongly Hiera-oriented scenario, a site module is usually needed. We might, for example, use custom classes to manage edge cases or exceptions that could be difficult to replicate with Hiera without adding a specific entry in the hierarchy.

One important point to consider when moving most of our logic in Hiera is how much this costs in terms of the length of the hierarchy. Sometimes, a simple, even if not elegant, custom class that deals with a particular edge case may save us from adding a layer in the hierarchy.

Foreman smart variables

Smart variables are The Foreman's alternative approach to Hiera for the full management of the variables used by nodes.

The Foreman can automatically detect the parameters exposed by classes and allows us to set values for them according to custom logic, providing them via the ENC functionality (support for parameterized classes via ENC has been available since Puppet 2.6.5).

To each class, we can map one or more smart variables, which may have different values according to customizable conditions and hierarchies.

The logic is somewhat similar to Hiera, with the notable difference that we can have a different hierarchy for each variable and have other ways to define its content according to custom queries and conditions.

User's experience benefits from the web interface and may result in being easier than editing directly Hiera files. The Foreman's auditing features allows us to track changes as an SCM would do on plain files.

We don't have the multiple backends flexibility as we have with Hiera, and we'll be completely tied to The Foreman for the management on our nodes.

Personally, I have no idea of how many people are extensively using smart variables in their setups; just be aware that there exists this alternative for data management.

Fact-driven truths

A fact-driven approach was theorized by Jordan Sissel, the Logstash's author, in a 2010's blog post ( The most authoritative information we can have about a node comes from its own facts.

We may decide to use facts in various places, such as in our hierarchy, in our site code, in templates, and if our facts permit us to identify the nodes' role, environment, zone, or any identifying variable, we might not even need nodes classification and manage everything in our site module or via Hiera.

It is now very easy to add custom facts by placing a file in the node's /etc/facter/facts.d directory. This can be done, for instance, by a cloud provisioning script.

Alternatively, if our node names are standardized and informative, we can easily define our identifying variables in custom facts, which might be provided by our site module.

If all the variables that identify our node come from facts, we can have site.pp as simple as:

include site

In our site/manifests/init.pp, we have something like the following code:

class site { if $::role { include "site::roles::role_${::role}" } }

The top scope $::role variable would be, obviously, a fact.

Logic for data and classes to include can be managed where we prefer in site modules, Hiera or the ENC.

The principle here is that as much data as possible and especially the nodes' identifying variables, should come from facts.

Also, in this case, common sense applies and extreme usage deviations should be avoided; in particular; a custom Ruby fact should compute its output without any local data. If we start placing data inside the fact in order to return data, we are probably doing something wrong.

Nodeless site.pp

We have seen that site.pp does not necessarily need to have node definitions in its content in imported files. We don't need them when we drive everything via facts, where we manage class inclusion in Hiera, and we don't need them with an approach where conditionals based on host names are used to set the top scope variables that identify nodes:

# nodeless site.pp # Roles are based on hostnames case $::hostname { /^web/: { $role = 'web' } /^puppet$/: { $role = 'puppet' } /^lb/: { $role = 'lb' } /^log/: { $role = 'log' } /^db/: { $role = 'db' } /^el/: { $role = 'el' } /^monitor/: { $role = 'monitor' } default: { $role = 'default' } } # Env is based on hostname or (sub) domain if 'devel' in $::fqdn { $env = 'devel' } elsif 'test' in $::fqdn { $env = 'test' } elsif 'qa' in $::fqdn { $env = 'qa' } else { $env = 'prod' } include site # hiera_include('classes')

Here, the $role and $env variables are set at the top scope according to hostnames named in a way that we can parse them with Puppet code.

At the end, we just include our site class or use hiera_include to manage the grouping logic for what classes to include in our nodes.

Such an approach makes sense only where we don't have to manage many different hostnames or roles and where the names of our nodes follow a naming pattern that lets us derive identifying variables.

Note that the $::hostname or $clientcert variables might be forged and return untrusted values. Since Puppet 3.4, if we set trusted_node_data = true in puppet.conf, we have the special variable, $trusted['certname'] to identify a verified hostname, at our disposal.

Node inheritance done right

Node inheritance has a bad reputation. All the Puppet books around, even the ones of a giant like James Turnbull, and the official Puppet Labs documentation describe it as a bad practice, and this puzzles me, as I had successfully used this approach for years.

The main problem, I think, has been a documentation issue, as it's not well explained to users that node inheritance makes sense when used to assign variables, but it is dangerous when used to manage class grouping.

Let's see an example of a wrong approach to node inheritance:

node default { include general } node 'it' inherits default { $zone = 'it' } node '' inherits 'it' { $role = 'web' include role::web }

The issue in this extremely simplified example is that when Puppet parses general, it hasn't set the $role and $zone values, and the result for the resources declared in general, depending on them, would probably not be what was expected.

When node inheritance is used only to set and eventually override variables and not to include classes, none of these problems are present:

node basenode { dns_server = '' } node 'it' inherits basenode { $zone = 'it' $dns_server = '' } node '' inherits 'it' { $role = 'web' include site }

Now, I would not use this approach anymore, as it requires an include line for each node, and it sets variables at node scope. This implies that while they can be used in the node's classes, we cannot refer to them with a fully qualified name, and so, for example, we cannot use them on a Hiera hierarchy.

Still, in situations where an ENC and Hiera are not used and the node names are not descriptive, this is a working and effective way to identify nodes using the benefits of a hierarchical structure, based on inheritance.


We have examined the tasks we have to deal with: how to manage nodes, how to group them, how to set the parameters of the classes we use, and where to place the configuration files we use. We have reviewed several tools at our disposal: The Foreman, Hiera, and our custom site modules.

We have also seen some examples on how these elements can be managed in different combinations.

Resources for Article:

Further resources on this subject:

You've been reading an excerpt of:

Extending Puppet

Explore Title