You're reading from Hands-On Infrastructure Monitoring with Prometheus

Product typeBook

Published inMay 2019

PublisherPackt

ISBN-139781789612349

Edition1st Edition

Tools

Prometheus

Concepts

Application Monitoring

Authors (2):

Joel Bastos

Pedro Araújo

View More author details

Understanding and Extending Alertmanager

Alerting is a critical component in any monitoring stack. In the Prometheus ecosystem, alerts and their subsequent notifications are decoupled. Alertmanager is the component that handles these alerts. In this chapter, we'll be focusing on converting alerts into useful notifications using Alertmanager. From reliability to customization, we'll delve into the inner workings of the Alertmanager service, providing the required knowledge to configure, troubleshoot, and customize all the options available. We'll make sure that concepts such as alert routing, silencing, and inhibition are clear so that you can decide how to implement them in your own stack.

Since Alertmanager is a critical component, high availability will also be explored, and we will also explain the relationship between Prometheus and Alertmanager. We will customize...

Setting up the test environment

To work with Alertmanager, we'll be three new instances to simulate a highly available setup. This approach will allow us to not only expose the required configurations, but also validate how everything works together.

The setup we'll be using resembles the following diagram:

Figure 11.1: Test environment

Deployment

Let's begin by deploying the Alertmanager test environment:

To launch a new test environment, move into this chapter's path, relative to the repository root:

cd ./chapter11/

Ensure that no other test environments are running and spin up this chapter's environment:

vagrant global-status
vagrant up

You can validate the successful deployment of the test...

Alertmanager fundamentals

We covered how alerting rules work in Prometheus in Chapter 9, Defining Alerting and Recording Rules, but those, by themselves, aren't all that useful. As we mentioned previously, Prometheus delegates notification handling and routing to external systems through a Webhook-style HTTP interface. This is where Alertmanager comes in.

Alertmanager is responsible for accepting the alerts generated from Prometheus alerting rules and converting them into notifications. The latter can take any form, such as email messages, chat messages, pages, or even Webhooks that will then trigger custom actions, such as logging alerts to a data store or creating/updating tickets. Alertmanager is also the only component in the official stack that distributes its state across instances so that it can keep track of things such as which alerts were already sent and which...

Alertmanager configuration

In Chapter 9, Defining Alerting and Recording Rules, we discussed how Prometheus generates and pushes out alerts. Having also made clear the distinction between an alert and a notification, it's now time to use Alertmanager to handle the alerts that are sent by Prometheus and turn them into notifications.

Next, we'll go through the configuration required on Prometheus, along with the configuration options available in Alertmanager, so that we have notifications going out from our monitoring stack.

Prometheus configuration

There are a couple of configurations that need to be done in Prometheus so that we can start using Alertmanager. The first thing to do is configure the external labels...

Common Alertmanager notification integrations

Users and/or organizations have different requirements regarding notification methods; some might be using HipChat as a means of communication, while others rely on email, on-call usually demands a pager system such as PagerDuty or VictorOps, and so on. Thankfully, Alertmanager provides several integration options out of the box and covers most of the notification needs you might have. If not, there's always the Webhook notifier, which allows integration with custom notification methods. Next, we'll be exploring the most common integrations and how to configure them, as well as providing basic examples to get you started.

Something to keep in mind when considering integrating with chat systems is that they're designed for humans, and the use of a ticketing system is advised when thinking about low-priority alerting....

Customizing your alert notifications

For each of the available integrations, Alertmanager already includes built-in templates for their notifications. However, these can be tailored to the specific needs of the user and/or organization. Similar to the alerting rule annotations we explored in Chapter 9, Defining Alerting and Recording Rules, alert notifications are templated using the Go templating language. Let's use the Slack integration as an example and understand how the messages are constructed so that they are tailored to your needs.

Default message format

To have an idea of what a notification without any customization looks like, we're going to use a very simple example. Take the following alerting rule,...

Who watches the Watchmen?

The monitoring system is a critical component of any infrastructure. We rely on it to keep watch over everything – from servers and network devices to services and applications – and expect to be notified whenever there's a problem. However, when the problem is on the monitoring stack itself, or even on a notification provider so that alerts are generated but don't reach us, how will we, as operators, know?

Guaranteeing that the monitoring stack is up and running, and that notifications are able to reach recipients, is a commonly overlooked task. In this section, we will go into what can be done to mitigate risk factors and improve overall confidence in the monitoring system.

Meta-monitoring and cross-monitoring

...

Summary

In this chapter, we dived into the alerting component of the Prometheus stack, Alertmanager. This service was designed with availability in mind, and we had the opportunity to understand how it works, from generating better notifications to avoiding being flooded by useless ones. The notification pipeline is a very good starting point to grok the inner workings of Alertmanager, but we also went through its configuration, while providing examples to better solidify that knowledge. We were introduced to amtool and all the features it provides, such as adding, removing, and updating silences directly from the command line.

Alertmanager has several notification integrations available and we went through all of them, so you can pick and choose the ones you're interested in. Since we all want better notifications, we delved into how to customize the default notifications...

Questions

What happens to the notifications if there's a network partition between Alertmanager instances in the same cluster?
Can an alert trigger multiple receivers? What is required for that to happen?
What's the difference between group_interval and repeat_interval?
What happens if an alert does not match any of the configured routes?
If the notification provider you require is not supported natively by Alertmanager, how can you use it?
When writing custom notifications, how are CommonLabels and CommonAnnotations populated?
What can you do to ensure that the full alerting path is working from end to end?

Joel Bastos is an open source supporter and contributor, with a background in infrastructure security and automation. He is always striving for the standardization of processes, code maintainability, and code reusability. He has defined, led, and implemented critical, highly available, and fault-tolerant enterprise and web-scale infrastructures in several organizations, with Prometheus as the cornerstone. He has worked at two unicorn companies in Portugal and at one of the largest transaction-oriented gaming companies in the world. Previously, he has supported several governmental entities with projects such as the Public Key Infrastructure for the Portuguese citizen card. You can find his blogs at kintoandar and on Twitter with the handle @kintoandar.
Read more about Joel Bastos

Pedro Araújo

Pedro Arajo is a site reliability and automation engineer and has defined and implemented several standards for monitoring at scale. His contributions have been fundamental in connecting development teams to infrastructure. He is highly knowledgeable about infrastructure, but his passion is in the automation and management of large-scale, highly-transactional systems. Pedro has contributed to several open source projects, such as Riemann, OpenTSDB, Sensu, Prometheus, and Thanos. You can find him on Twitter with the handle @phcrva.
Read more about Pedro Araújo

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Hands-On Infrastructure Monitoring with Prometheus

Unlock this book and the full library FREE for 7 days

Authors (2)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide