For a list of all the ways technology has failed to improve the quality of life, please press three.
— Alice Kahn
In this chapter, you'll learn what Puppet is, and what it can help you do. Whether you're a system administrator, a developer who needs to fix servers from time to time, or just someone who's annoyed at how long it takes to set up a new laptop, you'll have come across the kind of problems that Puppet is designed to solve.
We have the misfortune to be living in the present. In the future, of course, computers will be smart enough to just figure out what we want, and do it. Until then, we have to spend a lot of time telling telling the computer things it should already know.
When you buy a new laptop, you can't just plug it in, get your e-mail, and start work. You have to tell it your name, your e-mail address, the address of your ISP's e-mail servers, and so on.
Also, you need to install the programs you use: your preferred web browser, word processor, and so on. Some of this software may need license keys. Your various logins and accounts need passwords. You have to set all the preferences up the way you're used to.
This is a tedious process. How long does it take you to get from a box-fresh computer to being productive? For me, it probably takes about a week to get things just as I want them. It's all the little details.
This problem is called configuration management, and thankfully we don't have it with a new laptop too often. But imagine multiplying it by fifty or a hundred computers, and setting them all up manually.
When I started out as a system administrator, that's pretty much what I did. A large part of my time was spent configuring server machines and making them ready for use. This is more or less the same process as setting up a new laptop: installing software, licensing it, configuring it, setting passwords, and so on.
Let's look at some of the tasks involved in preparing a web server, which is something sysadmins do pretty often. I'll use a fictitious, but all too plausible, website as an example. Congratulations: you're in charge of setting up the server for an exciting, innovative social media application called
cat-pictures.com
.
Assuming the machine has been physically put together, racked, cabled, and powered, and the operating system is installed, what do we have to do to make it usable as a server for cat-pictures.com
?
Add some user accounts and passwords
Configure security settings and privileges
Install all the packages needed to run the application
Customize the configuration files for each of these packages
Create databases and database user accounts; load some initial data
Configure the services that should be running
Deploy the
cat-pictures
applicationAdd some necessary files: uploaded cat pictures, for example
Configure the machine for monitoring
That's a lot of work. It may take a day or two if this is the first time you're setting up the server. If you're smart, you'll write down everything you do, so next time you can simply run through the steps and copy and paste all the commands you need. Even so, the next time you build a cat-pictures
server, it'll still take you a couple of hours to do this.
If the live server goes down and you suddenly need to build a replacement, that's a couple of hours of downtime, and with a pointy-haired boss yelling at you, it's a bad couple of hours.
Wouldn't it be nice if you could write a specification of how the server should be set up, and you could apply it to as many machines as you liked?
So the first problem with building servers by hand ( artisan server crafting, as it's been called) is that it's complicated and tedious and it takes a long time. There's another problem. The next time you need to build an identical server, how do you do it?
Your painstaking notes will no longer be up to date with reality. While you were on vacation, the developers installed a couple of new Ruby gems that the application now depends on—I guess they forgot to tell you. Even if everybody keeps the build document up to date with changes, no one actually tests the modified build process, so there's no way to know if it still works end-to-end.
Also, the latest version of MySQL in the Linux distribution has changed, and now it doesn't support some of the configuration parameters you used before. So the differences start to accumulate.
By the time you've got four or five servers, they're all a little different. Which is the authoritative one? Or are they all slightly wrong? The longer they're around, the more they will drift apart.
Wouldn't it be nice if the configuration on all your machines could be regularly checked and synchronized with a central, standard version?
The latest feature on cat-pictures.com
is that people can now upload movies of their cats doing adorable things. To roll out the new version to your five web servers, you need to install a couple of new package dependencies and change a configuration file. And you need to do this same process on each machine.
Humans just aren't good at accurately repeating complex tasks over and over; that's why we invented robots. It's easy to make mistakes, leave things out, or be interrupted and lose track of what you've done.
Changes happen all the time, and it becomes increasingly difficult to keep things up to date and in sync as your infrastructure grows.
Wouldn't it be nice if you only had to make changes in one place, and they rolled out to your whole network automatically?
A new sysadmin joins your organization, and she needs to know where all the servers are, and what they do. Even if you keep scrupulous documentation, it can't always be relied on. In real life, we're too busy to stop every five minutes and document what we just did.
The only accurate documentation, in fact, is the servers themselves. You can look at a server to see how it's configured, but that only applies while you still have the machine. If something goes wrong and you can't access the machine, or the data on it, your only option is to reconstruct the lost configuration from scratch.
Wouldn't it be nice if you had a configuration document which was guaranteed to be up to date?
Ideally, all your machines would have the same hardware and the same operating system. If only things were that easy. What usually happens is that we have a mix of different types of machines and different operating systems and we have to know about all of them.
The command to create a new user account is slightly different for Red Hat Linux from the equivalent command for Ubuntu, for example. Solaris is a little different again. Each command is doing basically the same job, but has differences in syntax, arguments, and default values.
This means that any attempt to automate user management across your network has to take account of all these differences, and if you add another platform to the mix, then that further increases the complexity of the code required to handle it.
Wouldn't it be nice if you could just say how things should be, and not worry about the details of how to make it happen?
Sometimes you start trying to fix a problem and instead make things worse. Or things were working yesterday, and you want to go back to the way things were then. Sorry, no do-overs.
When you're making manual, ad hoc changes to systems, you can't roll back to a point in time. It's hard to undo a whole series of changes. You don't have a way of keeping track of what you did and how things changed.
This is bad enough if there's just one of you. When you're working in a team, it gets even worse, with everybody making independent changes and getting in each other's way.
When you have a problem, you need a way to know what changed, and when, and who did it. Ideally, you could look at your configuration document and say, "Hmm, Carol checked in a change to the FTP server last night, and today no one can log in. It looks like she made a typo." You can fix or back out of the change, and have Carol buy the team lunch.
Wouldn't it be nice if you could go back in time?
Most of us have tried to solve these problems of configuration management in various ways. Some write shell scripts to automate builds and installs, some use makefiles to generate configurations, some use templates and disk images, and so on. Often these techniques are combined with version control, to solve the history problem. Systems like these can be quite effective, and even a little bit of automation is much better than none.
The disadvantage with this kind of home-brewed solution is that each sysadmin has to reinvent the wheel, often many times. The ways in which organizations solve the configuration management problem are usually proprietary and highly site-specific. So for every new place you work, you need to build a new configuration management system (CM system).
Because everyone has his own proprietary, unique system, the skills associated with it aren't transferable. When you get a new job, all the time and effort you spent becoming a wizard on your organization's CM system goes to waste; you have to learn a new one.
Also, there's a whole lot of duplicated effort. The world really doesn't need more template engines, for example. Once a decent one exists, it would make sense for everybody to use it, and take advantage of ongoing improvements and updates.
It's not just the CM system itself that represents duplicated, wasted effort. The configuration scripts and templates you write could also be shared and improved by others, if only they had access to them. After all, most server software is pretty widely used. A program in configuration language that sets up Apache could be used by everybody who uses Apache—if it were a standard language.
Once you have a CM system with a critical mass of users, you get a lot of benefits. A new system administrator doesn't have to write his own CM tool, he just grabs one off the shelf. Once he learns to use it, and to write programs in the standard language, he can take that skill with him to other jobs.
He can choose from a large library of existing programs in the standard configuration language, covering most of the popular software in use. These programs are updated and improved to keep up with changes in the software and operating systems they manage.
This kind of beneficial network effect is why we don't have a million different operating systems, or programming languages, or processor chips. There's strong pressure for people to converge on a standard. On the other hand, we don't have just one of each of those things either. There's never just one solution that pleases everybody.
If you're not happy with an existing CM system, and you have the skills, you can write one that works the way you prefer. If enough other people feel the same way, they will form a critical mass of users for the new system. But this won't happen indefinitely; standardization pressure means the market will tend to converge on a small number of competing systems.
This is roughly the situation we have now. Several different CM systems have been developed over the years, with new ones coming along all the time, but only a few have achieved significant market share. At the time of writing, at least for UNIX-like systems, these CM systems are Puppet, Chef, and CFEngine.
There really isn't much to choose between these different systems. They all solve more or less the same problems—the ones we saw earlier in this chapter—in more or less the same way. Some people prefer the Puppet way of doing things; some people are more comfortable with Chef, and so on.
But essentially, these, and many other CM systems, are all great solutions to the CM problem, and it's not very important which one you choose as long as you choose one.
Once we start writing programs to configure machines, we get some benefits right away. We can adopt the tools and techniques that regular programmers—who write code in Ruby or Java, for example—have used for years:
Powerful editing and refactoring tools
Version control
Tests
Pair programming
Code reviews
This can make us more agile and flexible as system administrators, able to deal with fast-changing requirements and deliver things quickly to the business. We can also produce higher-quality, more reliable work.
Some of the benefits are more subtle, organizational, and psychological. There is often a divide between "devs", who wrangle code, and "ops", who wrangle configuration. Traditionally, the skill sets of the two groups haven't overlapped much. It was common until recently for system administrators not to write complex programs, and for developers to have little or no experience of building and managing servers.
That's changing fast. System administrators, facing the challenge of scaling systems to enormous size for the web, have had to get smart about programming and automation. Developers, who now often build applications, services, and businesses by themselves, couldn't do what they do without knowing how to set up and fix servers.
The term "devops" has begun to be used to describe the growing overlap between these skill sets. It can mean sysadmins who happily turn their hand to writing code when needed, or developers who don't fear the command line, or it can simply mean the people for whom the distinction is no longer useful.
Devops write code, herd servers, build apps, scale systems, analyze outages, and fix bugs. With the advent of CM systems, devs and ops are now all just people who work with code.
Being a sysadmin, in the traditional sense, is not usually a very exciting job. Instead of getting to apply your experience and ingenuity to make things better, faster, and more reliable, you spend a lot of time just fixing problems, and making manual configuration changes that could really be done by a machine. The following carefully-researched diagram shows how traditional system administration compares to some other jobs in both excitement and stress levels:
We can see from this that manual sysadmin work is both more stressful and more boring than we would like. Boring, because you're not really using your brain, and stressful, because things keep going wrong despite your best efforts.
Automating at least some of the dull manual work can make sysadmin work more exciting, because it frees you for things that are more important and challenging, such as figuring out how to make your systems more resilient or more performant.
Having an automated infrastructure means your servers are consistent, up to date, and well-documented, so it can also make your job a little less stressful. Or, at any rate, it can give you the freedom to be stressed about more interesting things.
So how do you do system administration with Puppet? Well, it turns out, not too differently from the way you already do it. But because Puppet handles the low-level details of creating users, installing packages, and so on, you're now free to think about your configuration at a slightly higher level.
Let's look at an example sysadmin task and see how it's handled the traditional way and then the Puppet way.
A new developer has joined the organization. She needs a user account on all the servers. The traditional approach will be as follows:
Log in to server 1.
Run the
useradd rachel
command to create the new user.Create Rachel's home directory.
Log in to server 2 and repeat these steps.
Log in to server 3 and repeat these steps.
Log in to server 4 and repeat these steps.
Log in to server 5 and repeat these steps.
The first three steps will be repeated for all the servers.
Here's what you might do to achieve the same result in a typical Puppet-powered infrastructure:
Add the following lines to your Puppet code:
user { 'rachel': ensure => present, }
Puppet runs automatically a few minutes later on all your machines and picks up the change you made. It checks the list of users on the machine, and if Rachel isn't on the list, Puppet will take action. It detects what kind of operating system is present and knows what commands need to be run in that environment to add a user. After Puppet has completed its work, the list of users on the machine will match the ones in your Puppet code.
The key differences from the traditional, manual approach are as follows:
You only had to specify the steps to create a new user once, instead of doing them every time for each new user
You only had to add the user in one place, instead of on every machine in your infrastructure
You didn't have to worry about the OS-specific details of how to add users
It's not hard to see that, if you have more than a couple of servers, the Puppet way scales much better than the traditional way. Years ago, perhaps many companies would have had only one or two servers. Nowadays it's common for a single infrastructure to have tens or even hundreds of servers.
By the time you've got to, say, five servers, the Puppet advantage is obvious. Not counting the initial investment in setting up Puppet, you're getting things done five times faster. Your colleague doing things the traditional, hand-crafted way is still only on machine number 2 by the time you're heading home.
Above ten servers the traditional approach becomes almost unmanageable. You spend most of your time simply doing repetitive tasks over and over just to keep up with changes. To look at it in another, more commercial way, your firm needs ten sysadmins to get as much work done as one person with Puppet.
Beyond ten or so servers, there simply isn't a choice. You can't manage an infrastructure like this by hand. If you're using a cloud computing architecture, where servers are created and destroyed minute-by-minute in response to changing demand, the artisan approach to server crafting just won't work.
We've seen the problems that Puppet solves, and how it solves them, by letting you express the way your servers should be configured in code form. Puppet itself is an interpreter that reads those descriptions (written in the Puppet language) and makes configuration changes on a machine so that it conforms to your specification.
What does this language look like? It's not a series of instructions, such as a shell script or a Ruby program. It's more like a set of declarations about the way things should be:
package { 'curl': ensure => installed, }
In English, this code says, "The curl
package should be installed". This snippet of code results in Puppet doing the following:
Checking the list of installed packages to see if
curl
is already installedIf not, installing it
Another example is as follows:
user { 'jen': ensure => present, }
This is Puppet language for the declaration "The jen
user should be present." Again, this results in Puppet checking for the existence of the jen
user on the system, and creating it if necessary.
So you can see that the Puppet program—the Puppet manifest—for your configuration is a set of declarations about what things should exist, and how they should be configured.
You don't give commands, such as "Do this, then do that." Rather, you describe how things should be, and let Puppet take care of making it happen. These are two quite different kinds of programming. The first (procedural style) is the traditional model used by languages, such as C, Python, shell, and so on. Puppet's is called declarative style because you declare what the end result should be, rather than specifying the steps to get there.
This means that you can apply the same Puppet manifest repeatedly to a machine and the end result will be the same, no matter how many times you run the "program". It's better to think of Puppet manifests as a kind of executable specification rather than as a program in the traditional sense.
This is powerful because the same manifest—"The curl
package should be installed and the jen
user should be present"—can be applied to different machines all running different operating systems.
Puppet lets you describe configuration in terms of resources—what things should exist—and their attributes. You don't have to get into the details of how resources are created and configured on different platforms. Puppet just takes care of it.
Here are some of the kinds of resources you can describe in Puppet:
Packages
Files
Services
Users
Groups
YUM repos
Nagios configuration
Log messages
/etc/hosts
entriesNetwork interfaces
SSH keys
SELinux settings
Kerberos configuration
ZFS attributes
E-mail aliases
Mailing lists
Mounted filesystems
Scheduled jobs
VLANs
Solaris zones
In fact, since you can define custom resources to manage anything that's not covered by the built-in resources, there are no limits. Puppet allows you to automate every possible aspect of system configuration.
A quick rundown of what we've learned in this chapter.
Manual configuration management is tedious and repetitive, it's error-prone, and it doesn't scale well. Puppet is a tool for automating this process.
You describe your configuration in terms of resources such as packages and files. This description is called a manifest.
When Puppet runs on a computer, it compares the current configuration to the manifest. It will take whatever actions are needed to change the machine so that it matches the manifest.
Puppet supports a wide range of different platforms and operating systems, and it will automatically run the appropriate commands to apply your manifest in each environment.
Using Puppet addresses a number of key problems with manual configuration management:
You can write a manifest once and apply it to many machines, avoiding duplicated work
You can keep all your servers in sync with each other, and with the manifest
The Puppet manifest also acts as live documentation, which is guaranteed to be up to date
Puppet copes with differences between operating systems, platforms, command syntaxes, and so on
Because Puppet manifests are code, you can version and manage them in the same way as any other source code
The problems with manual configuration management become acute when your infrastructure scales to 5-10 servers. Beyond that, especially when you're operating in the cloud where servers can be created and destroyed in response to changing demand, some way of automating your configuration management is essential.
Puppet manifests are written in a special language for describing system configuration. This language defines units called resources, each of which describes some aspect of the system: a user, a file, a software package, and so on:
package { 'curl': ensure => installed, }
Puppet is a declarative programming language: that is, it describes how things should be, rather than listing a series of actions to take, as in some other programming languages, such as Perl or shell. Puppet compares the current state of a server to its manifest, and changes only those things that don't match. This means you can run Puppet as many times as you want and the end result will be the same.