You're reading from Mastering Prometheus

Product typeBook

Published inApr 2024

PublisherPackt

ISBN-139781805125662

Edition1st Edition

Concepts

DevOps

Author (1)

William Hegedus

Observability, Monitoring, and Prometheus

Observability and monitoring are two words that are often used synonymously but carry important distinctions. While this book is not focused on academic definitions and theories surrounding observability, it’s still useful to distinguish between observability and monitoring because it will provide you with a framework to get in the right mindset when thinking about how Prometheus works and what problems it solves. A screw and a nail can both hang a picture, and you can bang a screw into a wall with a hammer, but that doesn’t make it the best tool for the job. Likewise, with Prometheus, I’ve seen many people fall into the trap of trying to use Prometheus to cover all of their observability and monitoring needs – when you have a hammer, everything looks like a nail. Instead, let’s identify where Prometheus shines so that we can use it to its full effect throughout the rest of this book.

In this chapter, we...

A brief history of monitoring

In the beginning, there was Nagios… or, at least, so the story goes. Monitoring as we know it took off in the late 1990s and early 2000s with the introduction of tools such as Nagios, Cacti, and Zabbix. Sure, some things existed before that that focused on network monitoring such as Multi Router Traffic Grapher (MRTG) and its offshoot, rrdtool, but system monitoring – including servers – found its stride with Nagios. And it was good… for a time.

Nagios (and its ilk) served its purpose and – if your experience is anything like mine – it just won’t seem to go away. That’s because it does a simple job, and it does it fairly well. Let’s look a little closer at it, the philosophy it embodies, and where it differs from Prometheus.

Nagios

Early monitoring tools such as Nagios were check-based. You give it a script to run with some basic logic and it tells you whether things are good, bad...

Introduction to observability concepts

Observability both as a word and as a discipline is not unique to technology. The term is derived from control theory, which is traditionally more rooted in physical engineering disciplines such as robotics and nuclear engineering. It is, in essence, the ability to surmise the health of a system by observing its inputs and outputs. In nuclear engineering, you put in uranium and water, and you receive heat and steam. In software engineering, you put in an end user and an API call, and you receive a Jira ticket about how your API isn’t working. Err… well, hopefully not if your observability is doing its job.

Observability in systems engineering and software is primarily informed by and achieved with a handful of important telemetry signal types. You may have heard them referred to as “the three pillars of observability,” but that terminology has since fallen out of fashion as it elevates the act of gathering telemetry...

Prometheus’s role in observability

Prometheus is objectively pretty great at what it does, but can we make a system fully observable with just Prometheus? Unfortunately, the answer to that question is no. Prometheus’s strength is also its weakness – it’s singularly focused on one thing: metrics.

Prometheus is only focused on the metrics aspect of an observable system. It is purpose-built to efficiently store numeric time series of varying types in a simple format. To the extent to which it interoperates with other observability signals, it is only to provide a link or bridge to some other purpose-built system.

However, Prometheus also provides some of the highest-value data that you can collect. It’s likely your go-to data source in tools such as Grafana to visualize how your systems are performing. Logging and tracing systems can certainly provide more detailed data, but to get them to provide that same level of value in analyzing trends, you...

Summary

In this chapter, we learned the abridged history of monitoring and observability, what observability is, and how Prometheus contributes to observability through metrics. With this new (or refreshed) frame of mind, we can approach our utilization of Prometheus in a way that maximizes its usefulness without trying to use it as a silver bullet to solve all our problems.

In the next chapter, we’ll cover deploying a Prometheus environment that we’ll use as the foundation that we build upon throughout the remainder of this book.

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2