Getting Started with Terraform

By Kirill Shirinkin
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Infrastructure Automation

About this book

Terraform is a tool used to efficiently build, configure, and improve production infrastructure. It can manage existing infrastructure as well as create custom in-house solutions.

This book shows you when and how to implement infrastructure as a code practices with Terraform. It covers everything necessary to set up complete management of infrastructure with Terraform, starting with the basics of using providers and resources.

This book is a comprehensive guide that begins with very small infrastructure templates and takes you all the way to managing complex systems, all using concrete examples that evolve over the course of the book. It finishes with the complete workflow of managing a production infrastructure as code – this is achieved with the help of version control and continuous integration. At the end of this book, you will be familiar with advanced techniques such as multi-provider support and multiple remote modules.

Publication date:
January 2017
Publisher
Packt
Pages
206
ISBN
9781786465108

 

Chapter 1. Infrastructure Automation

Before starting to learn Terraform, you first need to learn certain concepts in the modern infrastructure. To be able to use the new tool, one needs to understand what problem it solves. In order to do it, this chapter will cover the following topics:

  • Learning what Infrastructure as Code is and why it is needed

  • Understanding the benefits of declarative approach for configuration management

  • Explaining the missing points of configuration management tools

  • Laying out requirements for high-level infrastructure automation

  • Taking a quick look at main tools in order to provision infrastructure

  • The short overview and history of Terraform

  • What you will learn in this book

 

What is Infrastructure as Code and why is it needed?


The amount of servers used by almost any project is growing rapidly mostly due to increasing adoption of cloud technologies. As a result, traditional ways of managing IT infrastructure become less and less relevant.

The manual approach fits well for the farm of a dozen, perhaps even a couple of dozens of servers. But when we're talking about hundreds of them, doing anything by hand is definitely not going to play out well.

It's not only about servers, of course. Every cloud provider gives extra services on top, be it a virtual networking service, an object storage, or a monitoring solution, which you don't need to maintain yourself. These services function like a Software as a Service (SaaS). And actually, we should treat various SaaS products as part of our infrastructure as well. If you use NewRelic for monitoring purposes, then it is your infrastructure too, with the difference that you don't need to manage servers for it yourself. But how you use it and whether you use it correctly is up to you.

No surprises, companies of any size, from small start-ups to huge enterprises, are adopting new techniques and tools to manage and automate their infrastructures. These techniques got a new name eventually: Infrastructure as Code (IaC).

Dated something around 2009, the Infrastructure as Code term is all about approaching your IT-infrastructure tasks the same way you develop software. This includes the things similar to the following:

  • Heavy use of source control to store all infrastructure-related code

  • Collaboration on this code in the same fashion as applications are developed

  • Using Unit and Integration testing and even applying Test-driven development to infrastructure code

  • Introducing Continuous Integration and Continuous Delivery to test and release infrastructure code

Infrastructure as Code is a foundation for DevOps culture because once both operations and developers approach their work in the same way and by following principles laid out preceding, they already have some common ground.

Not saying that if your infrastructure is treated like code, then the border between development and operations becomes so blurry that the whole existence of this separation can become eventually quite questionable.

Of course, the introduction of Infrastructure as Code requires a new kind of tools.

 

Declarative vs Procedural tools for Infrastructure as Code


What is infrastructure code specifically? It highly depends on your particular infrastructure setup.

In the simplest case, it might be just a bunch of shell scripts and component-specific configuration files (Nginx configuration, cron jobs, and so on) stored in source control. Inside these shell scripts, you specify exact steps computer needs to take to achieve the state you need:

  1. Copy this file to that folder.

  2. Replace all occurrences of ADDRESS with mysite.com.

  3. Restart the Nginx service.

  4. Send an e-mail about successful deployment.

This is what we call procedural programming. It's not bad. For example, build steps of Continuous Integration tools such as Jenkins that are a perfect fit for a procedural approach—after all the sequence of command is exactly what you need in this case. 

However, you can only go that far with shell scripts when it comes to configuring servers and higher level pieces. The more common and mature approach these days is to use tools that provide a declarative, rather than a procedural way to define your infrastructure. With declarative definitions, you don't need to think how to do something; you only write what should be there.

Perhaps the main benefit of it is that rerunning a declarative definition will never do the same job twice, whereas executing the same shell script will most likely break something on the second run. Proper configuration management tool will ensure that the server will be in the exactly same state as defined in your code. This property of modern configuration and provisioning tools is named idempotency.

Let's look at an example. Let's say that you have a box in your network that hosts packages repository. For some reason, instead of using DNS server, you want to hardcode the IP address of this box to the  /etc/hosts file with a domain name repository.internal.

Note

In Unix-like systems, the  /etc/hosts file contains a local text database of DNS records. The system tries to resolve DNS name by looking at this file first, and only asking DNS-server only after.

Not a complex task to do, given that you only need to add a new line to the  /etc/hosts file. To achieve this, you could have a script like the following:

echo 192.168.0.5 repository.internal >> /etc/hosts/hosts

Running it once will do the job: required entry will be added to the end of the /etc/hosts file. But what will happen if you execute it again? You guessed it right: exactly the same line will be appended again. And even worse, what if the IP address of repository box will change? Then, if you execute your script, you will end up with two different host entries for the same domain name.

You can ensure idempotency yourself inside the script, with the high usage of conditional checks. But why reinvent the wheel when there is already a tool to do exactly this job? It would be so much better to just define the end result, without composing sequence of commands to achieve this.

And that is exactly what configuration management tools such as Puppet and Chef do by providing you a special Domain Specific Language (DSL) for defining the desired state of the machine. The certain downside is the necessity to learn a new DSL: a special small language focused on solving one particular task. It's not a complete programming language, neither does it to be; in this case, its only job is to describe the state of your server.

Let's look at how the same task could be done with the help of a Puppet manifest:

host { 'repository.internal': 
  ip => '192.168.0.5', 
} 

Applying this manifest multiple times will never add extra entries, and changing the IP address in the manifest will be reflected correctly in host files changing the existing entry, and not creating a new one.

Note

There is an additional benefit I should mention: on top of idempotency, you often get platform agnosticism. What it means is that the same definition could be used for completely different operating systems without any change. For example, by using package resource in Puppet, you don't care whether the underlying system uses rpm or deb.

Now you should better understand that when it comes to configuration management tools that provide the declarative way of doing things are preferred.

Modern configuration management tools such as Chef or Puppet completely solved the problem of setting up a single machine. There is an increasing number of high-quality libraries (be it cookbooks or modules) for configuring all kinds of software in an (almost) OS-agnostic way. But configuring what goes inside single server is only part of the picture. The other part that is located a layer above also requires a new tooling.

 

Infrastructure as Code in the Cloud


Quite often servers are only one part of infrastructure. With cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform, and OpenStack advancing more and more, there is an increased need for automating and streamlining the way people work with services these platforms provide. If you rely heavily on at least one cloud provider for major parts of your project, you will start meeting challenges in applying consistent patterns of their usage.

The approach of modern configuration management tools, while having been around for quite some time and having been adopted by many companies, has some inconveniences when it comes to managing anything but servers.

There is a strong likelihood you would want these patterns to be written once and then applied automatically. Even more, you need to be able to reproduce every action and test its result of it, following the aforementioned Infrastructure as Code principles. Otherwise, working with cloud providers will either end up in so-called ClickOps, where you work with infrastructure primarily by clicking buttons in web interface of cloud provider, or you will script all the processes by using APIs of this provider directly. And even if scripting APIs sounds like a big step towards true Infrastructure as Code, you can achieve much more using existing tools for this exact task.

There is a certain need for configuration tool that operates one level higher than a setup of a single server, a tool that would allow writing a blueprint that would define all of the high-level pieces at once: servers, cloud services, and even external SaaS products. A tool like this is called differently: infrastructure orchestrator, infrastructure provisioner, infrastructure templating, and so on. No matter how you call it, at some point in time, your infrastructure will really need it.

 

Requirements for infrastructure provisioner


Before proceeding to the existing solutions, let's lay out the list of most important requirements for a tool such as this, so we are able to choose it wisely.

Supports a wide variety of services

AWS alone already has dozens of entities to take care of. Other players (DigitalOcean, Google Cloud, Microsoft Azure, and so on) increase this number significantly. And if you want to add smaller SaaS providers to the game, you get hundreds of resources to manage.

Idempotency

Same as with a single server configuration, reapplying infrastructure template should not do the job twice. If you have a template defining 50 different resources, from EC2 instances to S3 buckets, then you do not want to duplicate or recreate all of them every time you apply the template. You want only missing parts to be created, existing ones to be in the desired state and the ones, which have become obsolete to be destroyed.

Dependency resolution

It is important to be able not just to define 2 app servers, 1 DB server and 2 security groups,but to also point them to each other using some lookup mechanism. Especially when creating a complete environment from scratch, you want to ensure the correct order of creation to achieve a flawless bootstrap of each component.

Note

Here and further in the book, the term environment will mean a complete set of resources that infrastructure consists of. It includes a network setup, all servers, and all related resources.

Robust integration with existing tools

Even though it is pretty awesome to have all infrastructure in one beautiful template, you still need to take care of what is happening on each particular server: applications need to be deployed, databases need to be configured, and so on. This is not the job for infrastructure provisioning tool. But certainly a tool like this should easily integrate with other tools such as Chef, which solves this problem already.

Platform agnosticism

Ideally, templates should be platform agnostic. This means that if I define a template for 2 app servers, 1 db server, all talk to each other, I should be able to easily switch from AWS to local Vagrant without rewriting the template. Platform agnosticism is difficult to obtain, while at the same time, might not really be needed that often. Completely changing the underlying platform is a rather rare event that happens perhaps once or twice in a product's lifetime.

Smart update management

This is a tricky one, and at the moment of writing, no tool can do it flawlessly in every case (and, honestly, unlikely it will ever do). What happens when I change a type of three EC2 instances from m3.medium to c4.xlarge? Will my m3.medium instances shut down and be replaced one by one with new ones? Will they be instantly destroyed leading to a few minutes of downtime? Or will the tool just ignore the updated instance type? Or will it not and then just override old nodes and I will end up with three new nodes and three old EC2 instances that I have to remove manually? Solutions to this problem differ from platform to platform, which makes it more complicated for the tool to be platform agnostic.

Ease of extension

The last requirement is of particular importance: there must be an easy way to extend this tool to support other resources. For example, if a tool lacks support for AWS Kinesis or particular feature or property of already supported service, and there is no plan to support it officially, then there has to be a way to implement it yourself quickly.

 

Which tools exist for infrastructure provisioning?


Now, when we have a problem to solve and a list of requirements to the tool that should solve the problem, we can go into specifics of different existing tools.

Scripting

Almost every cloud provider has an API, and if there is an API, you can script it. You could also go beyond single script and develop a small-focused tool just for your company to create environments. Disadvantages are: more software to develop and support in-house.

Configuration management

Most of configuration management tools already have a way to create cloud resources. Chef has Chef provisioning, which allows you to write recipes that define not entities on a single server, but multiple servers and components such as security groups of AWS and networking parts. There are also Puppet modules, which wrap cloud APIs into Puppet resources. Ansible also has modules to support providers such as AWS, Openstack, and others.

While the idea to use the single tool for both levels: high complete infrastructure definition and inside-a-server configuration is tempting, but it has some drawbacks . One of them is lack of support for many required services and immaturity of these solutions in general.

Also, the ways to use these tools for this purpose are kind of ambiguous. There are no well-defined workflows. Let's take AWS as an example. The recommended way to setup a firewall in AWS environment is to use Security Groups (SGs). SGs are a separate entity, which are available via web interface or via API.

What should you do if you want to create an AWS security group that allows connections from an app server to a database server? Should you put this code to a database package or an application package? AWS Security Group clearly doesn't belong to either of them.

The only meaningful solution is to create a separate package, which is dedicated to creating the security groups and performs searches against the nodes API to define inbound and outbound rules for these groups.

It's also unclear from where to execute this kind of code. From a workstation? From a separate AWS-resources node that has permissions to do this sort of thing? How do you secure it? How do you distribute keys? And, more importantly, how do you make this process reproducible and ready to be used in CI/CD pipelines? There is no clear answer to these questions from the configuration management tools' point of view.

The other downside is that you might not even have and want to have complete configuration management in your organization. Implementing them gives huge benefits, but steep learning curve and lack of in-house expertise can be significant blockers in their adaption.

CloudFormation/Heat

Both AWS and OpenStack have a built-in way to define all of their resources in one template. Often it works nicely in environments that are only AWS or only OpenStack. But as soon as you want to add another provider to the mix, you need another tool.

Terraform

Finally, there is Terraform, the tool this book is about, and the one we will use to codify complete infrastructure or, at least, the top layer of it.

 

A short overview of Terraform


Terraform is an open source utility, created by a HashiCorp company, the same company that created Vagrant, Packer, Consul, and other popular infrastructure tools. It was initially released in July 2014 and since then has come a long way to become one of the most important tools for infrastructure provisioning and management.

This is how Terraform is described by HashiCorp:

... a tool for safely and efficiently building, combining, and launching infrastructure. From physical servers to containers to SaaS products, Terraform is able to create and compose all the components necessary to run any service or application. (https://www.hashicorp.com/blog/terraform.html)

Terraform easily fits most of the requirements listed here:

  • At the time of writing, it supports over 30 different providers, from a huge ones such as AWS to a smaller ones such as multiple SaaS DNS providers.

  • Terraform provides a special configuration language to declare your infrastructure in simple text templates.

  • Terraform also implements a complex graph logic, which allows you to resolve dependencies intelligibility and reliability.

  • When it comes to servers, Terraform has multiple ways of configuring and wiring them up with existing configuration management tools.

  • Terraform is not platform agnostic in the sense described earlier, but it allows you to use multiple providers in a single template, and there are ways to make it somehow platform agnostic. We will talk about these ways closer to the end of the book.

  • Terraform keeps track of the current state of infrastructure it created and applies delta changes when something needs to be updated, added, or deleted. It also provides a way to import existing resources and target only specific resources.

  • Terraform is easily extendable with plugins, which should be written in Go programming language.

Over the next seven chapters, we will learn how to use Terraform and all of its features.

 

Journey ahead and how to read this book


This is a book about Terraform, and you will learn everything that there is to learn about this tool. There are two main parts of this book, split into six chapters of pure learning.

In next three chapters, we will learn the basics. In Chapter 2, Deploying First Server the next one, you will learn the basics of Terraform, main entities it uses and how to deploy our first server with it. We will also get a short AWS EC2 introduction.

In Chapter 3, Resource Dependencies and Modules, we will discover how exactly Terraform operates with its resources and how to refactor our code. In Chapter 4, Storing and Supplying Configuration you will learn all the possible ways you can configure your templates with various APIs Terraform provides.

If you are already familiar with Terraform basics, Chapter 2, Deploying First Server to Chapter 4, Storing and Supplying Configuration, might be a bit too boring for you. They are about how to use this tool as a first-time user, and they don't cover many advanced topics that you will get to once you run Terraform in production. Feel free to skip the next three chapters if you already used Terraform. For advanced topics, head over to Chapter 5, Connecting with Other Tools, Chapter 6, Scaling and Updating Infrastructure, and Chapter 7, Collaborative Infrastructure.

In Chapter 5, Connecting with Other Tools you will learn how to connect Terraform with many different tools, from configuration management to infrastructure testing tools. We will find out how to provision and reprovision machines and how to use Terraform in pair with literally any other tool.

In Chapter 6, Scaling and Updating Infrastructure, we will cover infrastructure updates with Terraform, from the very simple cases (such as changing the one property of some non-essential resource) to complex upgrade scenarios of whole clusters of machines.

Finally, in Chapter 7, Collaborative Infrastructure, you will learn how to collaborate on infrastructure work with Terraform. We will also master integration testing for Terraform environments.

Be prepared: this book is not only about Terraform. It's about Infrastructure as Code and various topics surrounding it, such as Immutable Infrastructure. Terraform will be the main tool we will study, but definitely not the only one. Configuration management tools, testing tools, half a dozen of small helper utilities, and the same amount of AWS services; get ready to learn the whole toolset required to embrace Infrastructure as Code because, as you will soon notice, Terraform is a tool that must be supported by other software.

In the final chapter, Chapter 8, Future of Terraform, we will run through multiple topics related to Terraform that did not make it to the other chapters. Chapter 8, Future of Terraform, also has a non-conventional piece on the future of Terraform, that you might or might not want to read before proceeding to learning it.

So, without further delay, let's proceed to creating our first server with Terraform.

 

Summary


In this chapter, you learned a lot about Infrastructure as Code principles and some tools that allow you to leverage them. There are many existing mature tools that take care of configuring what goes inside a single server, but there are not that many options when it comes to defining one level above a single server. We also listed requirements to a tool that would take care of configuring this higher level. Then we came to the conclusion, that Terraform meets many, if not all of these requirements. In the next chapter, we will finally get our hands dirty, install Terraform, and get to know how to use it to create a single AWS EC2 server.

About the Author

  • Kirill Shirinkin

    Kirill Shirinkin is an IT consultant who focuses on Cloud technologies and DevOps practices. He has worked in companies of different sizes and areas, from an online language learning leader to a major IT provider for the global travel industry and one of the largest management consultancies. He is also a cofounder of online mentorship platform mkdev.me, where he leads a team and teaches his students all about DevOps.

    Browse publications by this author
Book Title
Unlock this full book FREE 10 day trial
Start Free Trial