You're reading from Learn Grafana 10.x - Second Edition

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781803231082

Edition2nd Edition

Concepts

Data Visualization

Author (1)

Eric Salituro

Monitoring Data Streams with Grafana Alerts

In our last chapter, we explored the world of real-time data streaming by combining Telegraf’s input plugins to capture CPU metrics, and by simulating an IoT metrics pipeline with the addition of a Mosquitto broker and a simple Python script standing in for an IoT device.

As part of our three-chapter exploration of Grafana’s observability features, we’re going to move from simply streaming metrics to adding a key observability feature: the ability to trigger some form of an alert when certain conditions are met. Without the ability to monitor our systems and then alert when we detect anomalous behavior, we risk deterioration, instability, or even significant outages.

We’ll start out by discussing aspects of monitoring and observability with an eye toward good strategies for identifying the alert conditions we want to watch for. Next, we’ll talk about Grafana’s alerting features, especially the...

Technical requirements

Tutorial code, dashboards, and other helpful files for this chapter can be found in the book’s GitHub repository at https://github.com/PacktPublishing/Learn-Grafana-10/tree/main/Chapter12.

Monitoring and observability

The key to observability is, of course, proper monitoring. Without the ability to monitor, there can be no awareness of the status of your systems and, consequently, no way to react to changes in those systems, be they adverse or otherwise.

In our examples, we will be looking at two kinds of monitoring: an orchestrated computing platform such as Docker Compose or Kubernetes, and a lightweight application such as a web server. We will be using the techniques we demonstrated in the previous chapter to track metrics generated by Telegraf; these principles are the same whether we’re talking about small servers or massive compute clouds. But first, let’s discuss some key concepts.

Monitoring processes

When we look at monitoring whether it’s on-premises or in the cloud, we tend to see the world from two main perspectives: how the system manages its processes, or the processes themselves. Either you are monitoring how processes are...

Alerting in Grafana

Grafana alerting has evolved significantly over the last few versions to a complex, powerful, and versatile system for combining monitoring, alerting, and notification. Its power can be a bit intimidating but bear in mind that you may not need every capability in Grafana alerting.

We’ll take things step by step, so you can see how the parts fit together. Once you understand the basics, if you run into a more complex observability scenario, you will know how best to extend your own alerting to accommodate it.

Let’s start by reviewing how the Grafana alerting works. There are four main components to Grafana alerting: alert rules, labels, notification policies, and contact points. We’ll go over their roles one by one.

Alert rules

Alert rules are the trigger mechanism for Grafana alerting. You can have all the metrics you want streaming into Grafana, but if you don’t have any alert rules, how would you know when something is wrong...

Defining alert rules

Let’s start off by talking about how we want to look at triggering alerts. To build an alert, you will need to answer a series of questions in this form:

What condition must exist as measured by what metrics, and for how long?

Let’s break this concept down into its constituent parts.

What condition…

An alert ultimately boils down to a switch: at any given moment in time, the evaluation interval, an alert may need to be triggered. How you determine whether the alert should be in a triggered (or firing) state is called the alert condition. Most of the work you will do in defining an alert condition consists of reducing metrics data to a simple Boolean yes-or-no assertion about whether an alert should be triggered.

Space prevents us from devoting an entire chapter to exploring the possible ways to define alert conditions, but I can offer some heuristics for identifying possible alert conditions:

Is the condition based upon...

Alert messaging to contact points

Before we can establish the notification policies that will direct our alerts to contact points, we should first define our contact points. Grafana supports an ever-growing number of contact points, so to find out more about a specific contact point, consult the Grafana documentation.

We are going to concentrate on three common contact points that cover many typical use cases. We’ll use an email contact point to represent the typical use case where email is the destination for all alert messages. The Slack contact point is used when the alert needs to be visible to a defined group, such as the members of a Slack channel. Our final contact point is PagerDuty, a destination for alerts that need to be directed to a specific person or team for potentially immediate action.

Configuring an email contact point

One of the most common and oldest forms of contact point is simply good old email. Nearly everyone has it, and access to email servers...

Routing alerts with notification policies

Now that we have our alert rules and our contact points, we’re able to link them up using our notification policies. One of the most common notification policies is to match up an alert severity with a particular contact point. That is why we initially set a severity label when we created our alert rules.

Now that we have our severity label, we can use it in a notification policy, so let’s set up such a policy. A notification policy can be as simple or complex as you want. The point is to use the information represented in the labels to determine which contact point(s) should receive your alert. It can be as simple as that.

For example, you may have a situation where you want all your low-severity (informational) incidents to go to an email address, but you want medium-severity (actionable, normal response) incidents to go to Slack or Discord, and your high-severity (actionable, immediate response) incidents to go to PagerDuty...

Summary

This was an extensive chapter, and we covered a lot of ground. Observability is becoming a large, important technology sector and Grafana is keeping up by making its alerting capabilities more powerful and versatile.

In this chapter, we set up monitoring for both Docker and NGINX using InfluxDB as a data source. We created alert rules to query and analyze the data from our monitoring, used expressions to reduce the data to a single value, and created expressions to evaluate that value for violating conditions that might need to trigger an alert. We integrated Grafana contact points with email, PagerDuty, and Slack to receive our alerts with messages that contain annotation data set by our alert rules evaluation behavior. We also established notification policies to route our alert messages to different contact points based on the severity derived from alert rule labels. Finally, we briefly considered how to set up mute timings for when we might want to disable certain alert...

The rest of the chapter is locked

You have been reading a chapter from

Learn Grafana 10.x - Second Edition

Published in: Dec 2023Publisher: PacktISBN-13: 9781803231082

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Eric Salituro

Eric Salituro is currently a Software Engineering Manger with the Enterprise Data and Analytics Platform team at Zendesk. He has an IT career spanning over 30 years, over 20 of which were in the motion picture industry working as a pipeline technical director and software developer for innovative and creative studios like DreamWorks, Digital Domain, and Pixar. Before moving to Zendesk, he worked at Pixar helping to manage and maintain their production render farm as a Senior Software Developer. Among his accomplishments there was the development of a Python API toolkit for Grafana aimed at streamlining the creation of rendering metrics dashboards
Read more about Eric Salituro

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages