Welcome to the wonderful world of HashiCorp Consul, a versatile utility to manage, automate, and securely connect all of the services within your network. Thanks for joining me on this ride, and I hope we'll both learn a lot throughout the process. Within this chapter, we're going to be reviewing Consul at a high level just to understand its basics and applications; essentially, we'll be digging the foundation and pouring the concrete to create a stable base on which to form your Consul structure. And if you've ever looked at the leaning tower of Pisa, you know that a solid foundation is critical for any important structure! Within this chapter, we'll be learning about the following:
- The Consul system – what are the components and how do they live together in harmony?
- Consul service discovery – touching on the foundational use case on which all of the other Consul functionality builds
- Consul service mesh – the basics of Consul service mesh to securely enable service-to-service communication
- Network automation – a high-level view of the application of service discovery to automate your network infrastructure
We won't be getting into any code snippets quite yet, but rest assured they are coming. If you're familiar with the concepts and use cases of Consul, go ahead and skip to those chapters – I promise I won't be hurt (too much). However, this would be an excellent chapter to hand your manager when they ask, what in the world are you doing with this newfangled invention?
The Consul system – servers and clients
Sometimes, in order to understand exactly how something works, we need to understand the components involved. After all, within every form of communication, there is a transmitter and a receiver, and if we aren't familiar with either one it can be difficult to understand the message. Consul, at its core, is a communications engine. It provides a new structure to facilitate the communications of network services, not just network devices. If you've spent any time in the networking field, you'll have heard terms such as IP address, subnet mask, and gateway address. These are all critical components involved in the communications of your network devices. However, in the glorious world of the cloud, microservices, and dynamic adjustments to the network, these components will change as the network waxes and wanes. Throughout this painful process, we've learned that the addressing system we've grown to know and love was simply a means to an end. That end is the connection and efficient communication of our services. So how does Consul do that? We are about to find out!
Consul servers – the master of their domain
Within almost any operational system or structure, there is a brain. This might not be the Brain who, along with his sidekick Pinky, attempts to take over the world day after day. We're focusing here on the brain that makes the decisions. The brain that determines whether to have Indian food or Thai food for dinner. The brain that figures out what to wear on a particular day. The brain that decides which employee gets the biggest raise. Now, at this point, you might be saying computers and applications don't have brains; they only do what we tell them to do. You are absolutely correct, but that doesn't mean that the machines don't have the logic and intelligence, even if artificial, to make decisions. In some ways, they are superior, as machines and applications are able to focus on logic and less on ego and emotion. Right, Mr. Spock? That is correct, you subservient and primitive mammal.
When contemplating the brain of Consul, we must look at the Consul server cluster. Looking at synonyms for server, we find servant, and that is exactly what the Consul server does for the distributed Consul clients (which we'll discuss later). The servers also work together, with an elected server making primary decisions, and distributing those decisions to the other servers within the overall cluster. So, what kind of decisions do these servants of my network make?
For starters, one of the most useful decisions the Consul server makes is where to find services. Everything that Consul does is predicated on the need to not only discover the services on the network, and their location, but also to share that information with any entitled entity asking for that information. If you think of the old telephone switchboards, somebody (an application) would pick up a phone and instantly be connected with a central operator (the server). The caller (application) would request a particular number (service) to be connected to, such as Transylvania 6-5000. The operator (server) would then move those giant plugs around and connect the caller (application) with the destination party (service).
But what if I have applications that I don't want to be able to find specific services? Well, I'm glad you asked! In some cases, you have people or machines wishing to contact other people to talk to them about extending the warranty on their car. Perhaps I don't want to receive those messages. Well, I would tell that central operator to not allow those callers to contact me, kind of like a prehistoric Do Not Call Registry. These decisions are also managed by the Consul servers in accordance with your directive.
OK, it sounds like Consul can make a lot of cool and intelligent decisions automatically, but where does it get the information required to make those decisions? Well, how is any decision made? An employee's raise might be based upon their work performance considering they were also writing a book. What clothing we decide to wear is based on occasion and environment. What to eat for dinner can be based on…well, too many variables to list here! Every decision we make is only as good as the data we have. The Consul client provides that data.
Within the official Consul documentation, there is a single agent that can be configured in both client and server mode. Throughout this text, we will refer to the server agents simply as the server, and the client agent simply as the client.
The cluster constituents
As the brain is the decision-maker in any system, there must be entities that will feed that data and perform the work based on that decision. For example, when the manager makes their decisions, the data is likely fed by board members, senior management executives, and hopefully team members. That being said, we all know the team members are the ones that do all of the work!
The plethora of data fed into the decisions made by the Consul server structure consists of two categories:
- Consul configuration
- Network data
The configuration of course is determined by those that are operating the Consul system, hopefully well thought out, defined as code, and peer-reviewed. The data, however, is fed by Consul clients that are distributed throughout the network. These clients may co-reside with the application, or they may travel alongside the application watching its every move. Regardless of the location, the client is not only a client for the server, but it is also a client for the application, representing the application and reporting information about the application to the server. This enables Consul's deployment within the network without the need to modify existing applications or services. Furthermore, a single client can represent multiple services and applications.
Care should be taken when utilizing an external service monitor with the Consul client. The data available to the client is reduced in this configuration and it is less flexible with respect to migrating services.
The Consul client not only monitors the applications that it is associated with, but it also collects information about the distance between itself and other clients within the network. This allows the servers to make more intelligent decisions about which instances of the applications or services are best suited to serve the impending request. The concept is very similar to emergency services when you dial for police or the fire service. There is one centralized number that you dial into (that is, 911 in the United States), but that centralized number connects you with whatever dispatch is closest to your current location. As Voice over IP services spread, this was a difficult challenge that increased network intelligence helped us overcome.
The marriage of server and client
OK, now that we know what the servers do, and we know what clients do, how does this entire system work together? As I've mentioned before, Consul is all about communication. The Consul servers need to communicate with each other, and the clients all need to communicate with the server cluster.
The server communication is pretty manageable. It's a small dinner party with 3, 5, or 7 individuals. However, as you scale from tens to hundreds to potentially thousands of clients, having all of these clients report back to the relatively small server cluster can be quite overwhelming. Our intimate dinner party has expanded to a much larger party with tens, hundreds, or potentially thousands of guests. Can you imagine all of them wanting to talk to the head table at once! However, if the guests were able to chat amongst themselves, they could share their own information, discover new things, and only update the head table with pertinent and relevant information. In some cases, we might call this chat gossiping. Oddly enough, that's exactly the protocol Consul uses for this interaction, but we'll dig deeper into that in a later chapter.
Alright, now we have an understanding of the purpose of the servers, the purpose of the clients, and how they interact and communicate with each other. But what the heck can we do with this amazingly beautiful system? Why do I even care?
Oh where, oh where have my services gone?
As mentioned previously, one of the biggest problems related to the implementation of private and public cloud infrastructures is the fluidity of the network. Back in my day, any changes to the network required a myriad of steps. Let's focus on the simple need to deploy a new application. Let's assume that our application has already been written and tested:
- Review the machine requirements requested.
- Obtain the number of quotes for the requested machine(s).
- Submit a purchase requisition.
- Justify the purchase request with a number of management layers depending on cost.
- Submit the purchase order to the vendor.
- Wait for the machines to arrive.
- Decide what IP address we're going to allocate.
- Vote on what inventive name we'll call the machine(s).
- Determine where in the warm and loud server room we're going to mount the machine(s).
- Figure out how we're going to connect the machine(s).
- Once the machine arrives, install the actual hardware and plug it in.
- Argue with facilities about where to put the cardboard and leftover packing materials.
- Configure the machine with the allocated network parameters (address and name).
- Troubleshoot routing and address resolution tables.
- Provide the application team with the address and credentials for the server.
Does any of this sound familiar? Each of these steps required human discussion (which is inherently unreliable) and usually paperwork and emails that got lost in the shuffle. Eventually, it got better with ticketing systems, but instead of getting buried in emails, we got buried in tickets that took days, weeks, and sometimes months to process. If our services needed to move to different servers or different areas of the network, we would have to start this entire process again. Now, as we've migrated toward more automation in the private and public clouds, let's look at how we've improved:
- Review the machine requirements requested.
- Identify what virtual network it needs to live in.
- Execute the appropriate code modules to deploy the infrastructure.
- Troubleshoot connectivity problems.
- Provide the application team with the address and credentials for the server.
So many steps have disappeared, and the entire process of acquiring and deploying machines for our applications and services has been drastically simplified! However, the law of unintended consequences has not been eliminated. If you noticed, along with the elimination of cabling and mounting the hardware, we have also lost a lot of control surrounding the addressing of the devices. Of course, we can stipulate in what network our application or services should reside, but as we introduce new versions of the application, expand the application into other regions, or suffer a service loss and have to redeploy, the last thing we want to deal with is allocating and assigning addresses. But Rob, if I don't have an address, how do I find my service? Well, how many phone numbers do you remember now? You're not calling a number; you're calling a person (or sometimes an automated machine).
With Consul clients distributed throughout the network, tracking and monitoring your services, you don't need to know the addresses anymore. Don't worry about load balancers and firewalls, we'll get to that later. When the Consul service client is configured, it not only knows the address of its own machine but also recognizes what applications are running on the server through a set of intelligent health checks. These checks can be anything from a simple port check to a customized script to make sure the application is not just hearing but is actually listening (there is a difference). Once the server cluster is aware of the healthy application, and its location, then any application that can utilize the Domain Name Service (or DNS – you use this every time you use a web browser) can discover the application. There are other ways to share this information as well, but we'll discuss those later.
As we've seen, the evolution of our network infrastructure has produced some amazing benefits when it comes to the management and the deployment velocity of the infrastructure. However, it has also introduced some complexity and perhaps some unwelcome variability, but with Consul service discovery, we're able to embrace those changes without trying to figure out who moved my cheese. So now that we know where our services are, what do we do next?
Something is meshy around here
If you've read any technical articles within the last couple of years, especially within the discussions of microservices or Kubernetes, you'll have heard the term service mesh quite pervasively. Personally, whenever I think of a mesh, it's rarely a clean concept of machines communicating with each other. Usually, I think of the myriad of tunnels that have been buried underneath Boston, but that's another story. From a network perspective, however, a service mesh provides a way to securely connect your various services without requiring additional functionality within the applications themselves. But it still sounds kind of meshy, so why would I want to deal with this?
In the previous section, we talked about service discovery and how easy it is to determine where different applications and services reside, along with their health status. With that information, a bad actor can target the service for a variety of attacks and even service spoofing. For example, if I figure out that a particular application is connecting to a service, some sort of data store or provider, it would be possible to create a rogue service that the application would unknowingly connect to and potentially divulge sensitive information. Our ability to secure our applications has certainly improved over the past several years, however, the problem of imposters, cowans, and eavesdroppers has persisted, if not gotten worse, within a dynamic cloud world. And when this happens, it usually isn't good.
The Consul service mesh provides a networking model that ensures that all communication from service to service is not only encrypted but there is validation that the sender and receiver are actually who they claim to be. A great analogy for these communication systems is the postal system.
Let's step through the process of sending a package, something important that we want to secure and verify that it made it to the final destination:
- First, we're going to wrap the item we want to send in a box or envelope of some kind. On that package, we're going to place a label. Typically, that label will include the name of the person the package is destined for and their address. Note that this is a fixed address.
- We're going to hand that package to a postal worker…for argument's sake, we'll just call them Proxy. Proxy is going to make sure we are who we say we are, and because this package is important, we're going to tell Proxy that we need to make sure it goes to the right person.
- Proxy is going to potentially slap another label on the package and start moving the package through space and time. Since Proxy is local to your area, they aren't going with the package, but there is trust that the package will be protected along its journey.
The package may see all sorts of interesting things during its travels, but eventually, it will arrive at the name and address you intended. Proxy's distantly related sibling, Envoy, will validate that the package arrives at the proper location, based on the attached label. If we want to be fancy, we can even receive confirmation (such as an ACK) that it was delivered.
This simple analogy describes the communication within a service mesh. There is a message that needs to be communicated, but the application itself has neither the resources nor the intelligence to get that message securely to its intended destination. This is especially the case when that intended destination can change as, unlike our home address, we've already determined that we may not have the details of the destination address. So, the application hands that message to a local proxy, within the same machine, to eliminate the possibility of external snooping. That proxy has been configured with a certificate for two reasons. First, to validate its own authenticity. Second, to encrypt the message so that only those that are within the inner circle can obtain it. When the message gets to the other side, the receiving proxy can validate the certificate of the transmitter and unencrypt the message. From there, the proxy hands the message to the receiving application, again, on the same machine to eliminate any external snooping.
As we've seen, adding a service mesh into the Consul architecture enables us to not only discover our services but we can also securely, with validation, connect to those services. However, we may be using identical certificates within our network in order to validate and encrypt messages. This practice is not uncommon, and therefore opens the possibility of a rogue application within the mesh talking to whatever else it wants to. Well, we can't have that, so through Consul, we can create what's called an intention. When you create an intention within Consul, you're telling the associated Consul client who it can, and more importantly cannot, receive messages from. This provides yet another level of security within the service mesh.
OK, that service mesh stuff is a bit confusing, but I get it. Now that I'm not dealing with IP addresses, I can now set up connections between my applications and services that not only secure the message but also validate the sender. I can even control the communication within my own network by creating rules regarding who can talk to whom. But there is still all of this other network stuff that slows me down. Even in the cloud, I'm hit with delays with the firewall, load balancer, and other communication devices. Why can't we automate those pieces? Well, of course you can!
So, we've established how great it is to be able to discover the services available on our network without cumbersome static addressing, and how to utilize that to ensure the secure delivery of messages among our services. Wouldn't it be great to share this dynamic information with other components in the network? Well, why would we want to do that? To understand how this functionality can improve the lives of so many people, let's jump into the way-back machine and once again recall what we used to do, and often still do, when making adjustments to the network:
- Yeah! We have our application deployed into the network and I can communicate with it. However, I can't reach the service my application needs to hit. Better call the network team.
- The network team told me that the problem must be on my end because the destination service is alive and responding to their monitoring tools.
- After a few days of troubleshooting, we learn that there is a firewall in the path that is protecting that service from nefarious creatures.
- So, next, we submit a ticket to the firewall team and request a rule be configured to allow traffic from, you guessed it, my IP address.
- Of course, the firewall team is not only getting hammered with requests, but they know that one wrong move on the list of firewall rules and the number of tickets will be the least of their problems, so extreme care must be taken.
- Eventually, the team is able to safely add and test the new rule, and we're off and running.
Now after all of that, let's hope that we didn't forget to include any important information in that ticket (such as an additional port). Not to mention that when the address changes (not if), we'll need to enter the process again. This process continues today, even with cloud automation, for a number of different reasons. In my personal experience, these changes are the ones that tend to create the most confusion and delay in any project, and in a dynamic world, the changes aren't slowing down.
Hopefully, you can see how the knowledge that Consul is aware of plays such a critical role in this scenario. As services are added, updated, and removed, the Consul server cluster is aware of the changes, thanks to the distributed agents. Now the main challenge is how to get that information from Consul out to my network equipment, and there are a few ways to solve that.
One of the most straightforward methods for network devices to learn about service changes is for the equipment to learn about those changes directly from Consul via its API. This functionality is already in use with the Consul integration with the F5 Big-IP load balancer using the F5 Application Services interface. Whenever Consul discovers network updates, the load balancer learns of these updates and can adjust the load balancing pools automatically. Although this integration does require the network components to develop functionality specifically for the Consul API, there is no Consul agent required on the load balancer itself.
An alternative to this level of integration is to have Consul post updates to the service availability to a messaging service, such as ActiveMQ or RabbitMQ. Any messaging service, or other application, that can receive an HTTP message can receive service updates from Consul, without having to query it directly.
A more common method of integration is utilizing Consul's template feature. Just as it sounds, you can create a configuration file template to match whatever device that you need to manage. Within that template, you specify areas where you need input from Consul, such as the IP address of a particular endpoint! If you remember, Consul makes intelligent decisions based on the data it receives. All of that data the Consul servers receive from the clients are pieces in the puzzle of an intelligent and dynamic network. A very simplified analogy would be setting up a payee for paying our bills. I presume this is something many of us do, certainly more common than writing checks. On our bank website, we establish a payee – some company or person that we pay money to every month. If we look at our monthly electricity bills, for example, the amount we pay varies based on the level of electricity my kids use that month. Why can't they ever learn to turn off the lights? So, we have our payee set up, and we know we are paying the electric company. That is our template. All we need is the data point of the amount due. Once the electric company (the client) informs us (the server) of the amount due, we can fill in the amount and let the electronic payment go. Hopefully, this will happen before the electricity is shut off!
That template functionality, however, doesn't only apply to configuration files. HashiCorp does have several other products besides Consul, one of which is Terraform. If you aren't familiar with Terraform, it provides a standardized method to define system infrastructure as code for multiple cloud providers, but it also has providers for multiple applications. This includes firewalls, load balancers, monitoring systems, and so on. Now imagine coupling the dynamic aspect of Consul service discovery, with Terraform's infrastructure-as-code platform!
The Consul synchronization functionality with Terraform, called Consul-Terraform-Sync, is in public beta at the time of writing and supports a limited set of partner modules for automation. With HashiCorp's development velocity, by the time you're reading this, the functionality is likely fully supported.
Congratulations, we've made it through Chapter 1! I hope it was as fun for you reading it, as it was for me writing it! To review, we started learning about the overall Consul architecture, and how the servers and clients work together to create a dynamic and intelligent system. Like real life, those clients are really the workhorses of the system, monitoring the services, and feeding the data to the servers. That data collected by the distributed clients is fed back to the Consul server cluster and made available for a variety of consumers. The need for dynamic discovery has become critical as we've moved to the cloud and automation. With the awareness of our services, we are able to ensure the validation of transmitters and receivers, and encrypt the information communicated among our components. All of this is possible without changing our applications! Of course, it would be a shame to keep all of that useful network information to ourselves, and we were able to see how Consul shares that information among various devices to simplify our lives and accelerate the overall application deployment flow.
I hope this first chapter was a great start to your Consul journey, and I thank you for putting up with my jokes. Of course, there are more to come! As we dig a little deeper in Chapter 2, Architecture – How Does It Work?, we're going to learn more details about how these servers and clients actually communicate. And if you like, we're going to build our first cluster!