Reader small image

You're reading from  Mastering Prometheus

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805125662
Edition1st Edition
Concepts
Right arrow
Author (1)
William Hegedus
William Hegedus
author image
William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus

Right arrow

Using Service Discovery

In Chapter 1, we discussed some of the things that set Prometheus apart from other monitoring tools. One of the major features that distinguishes Prometheus is its built-in functionality around what it calls service discovery. Now we get to learn more about what it is and how it works.

Being able to leverage service discovery and even write custom service discovery mechanisms will allow you to establish Prometheus environments that are truly dynamic and cloud-native. With that in mind, we’re going to dive into a comprehensive look at what service discovery is and how you can make the most of it.

In this chapter, we’re going to cover the following main topics:

  • Service discovery overview
  • Using service discovery in a cloud provider
  • Custom service discovery endpoints with HTTP SD

Let’s get started!

Technical requirements

For this chapter, you’ll need the following:

  • The Prometheus environment from Chapter 2
  • Go (>=1.20)

This chapter’s code examples are available at https://github.com/PacktPublishing/Mastering-Prometheus.

Service discovery overview

In traditional monitoring environments, administrators either needed to know what they would be monitoring in advance or needed to configure push-based monitoring systems. Since the servers and devices being monitored did not change frequently, it was acceptable that adding or removing monitoring required configuration changes. However, in modern cloud-native environments, where the number of instances of an application can scale up and down automatically through systems such as Kubernetes’ HorizontalPodAutoscaler, this can introduce undesirable gaps in monitoring as targets that should be monitored are constantly added, removed, and replaced. Prometheus solves this through its service discovery system.

The Prometheus service discovery system provides over two dozen different methods of dynamically retrieving and adding scrape targets. These range from cloud providers such as AWS, Azure, and Akamai (Linode) to Docker and Kubernetes to more manual...

Using service discovery in a cloud provider

While the Kubernetes service discovery we’ve looked at so far is great and all, you more than likely have some regular old servers that need to be monitored too! How can we discover those? If you’re one of the thousands of people running virtualized infrastructure in the cloud, you may be entitled to dynamic service discovery.

Most established cloud providers have service discovery mechanisms built directly into Prometheus for discovering virtualized infrastructure. Such providers include AWS, Azure, GCP, DigitalOcean, Vultr, Hetzner, and – my personal favorite – Linode (also known as Akamai Connected Cloud).

I’m slightly biased in favor of Linode since – at the time of writing – I work there. It also doesn’t hurt that a former teammate of mine (TJ Hoplock) wrote and contributed to the Linode service discovery provider, which means I’m pretty familiar with it. So, let’...

Custom service discovery endpoints with HTTP SD

For several years after Prometheus’ release, a moratorium existed on implementing custom service providers – that is, pull requests to Prometheus to add new service discovery providers would not be accepted. The logic behind this was that it was too much of a burden on Prometheus maintainers to be taking on responsibility for newly implemented service discovery providers and reviewing new submissions when other, more high-priority work could be focused on instead. However, at the end of 2019, that moratorium was lifted.

As expected, the number of service discovery providers increased significantly from 11 before the end of the moratorium to 25 at the time of writing. However, not everything should be a service discovery provider in the upstream Prometheus code base. One of the requirements for accepting a new service discovery provider is that wherever you’re discovering from needs to be well-established and in...

Summary

In this chapter, we looked at service discovery in Prometheus. It’s a key feature that differentiates Prometheus and makes it well-suited to the cloud-native world. First, we looked at how service discovery works by evaluating its usage in the Prometheus environment we set up in Chapter 2. Next, we saw how cloud providers can implement service discovery providers in the upstream Prometheus code base by looking at how it’s done for Linode. Finally, we got our hands dirty by looking at generic HTTP-based service discovery and building an endpoint for it.

In the next chapter, we’ll look at how alerting works in Prometheus and dip back into the world of PromQL to see how we can make our alerts better through advanced queries.

Further reading

To learn more about the topics that were covered in this chapter, take a look at the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Prometheus
Published in: Apr 2024Publisher: PacktISBN-13: 9781805125662
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus