You're reading from Hands-On Infrastructure Monitoring with Prometheus

Product typeBook

Published inMay 2019

PublisherPackt

ISBN-139781789612349

Edition1st Edition

Tools

Prometheus

Concepts

Application Monitoring

Authors (2):

Joel Bastos

Pedro Araújo

View More author details

Scaling and Federating Prometheus

Prometheus was designed to be run as a single server. This approach will allow you to handle thousands of targets and millions of time series but, as you scale, you might find yourself in a situation where this just is not enough. This chapter tackles this necessity and clarifies how to scale Prometheus through sharding. However, sharding makes having a global view of the infrastructure harder. To address this, we will also go through the advantages and disadvantages of sharding, how federation comes into the picture, and, lastly, introduce Thanos, a component that was created by the Prometheus community to address some of the issues presented.

In brief, the following topics will be covered in this chapter:

Test environment for this chapter
Scaling with the help of sharding
Having a global view using federation
Using Thanos to mitigate Prometheus...

Test environment for this chapter

In this chapter, we'll be focusing on scaling and federating Prometheus. For this, we'll be deploying three instances to simulate a scenario where a global Prometheus instance gathers metrics from two others. This approach will allow us not only to explore the required configurations, but also to understand how everything works together.

The setup we'll be using is illustrated in the following diagram:

Figure 13.1: Test environment for this chapter

In the next section, we will explain how to get the test environment up and running.

Deployment

To launch a new test environment, move into the following chapter path, relative to the code repository root:

cd ./chapter13/

Ensure...

Scaling with the help of sharding

With growth come more teams, more infrastructure, more applications. With time, running a single Prometheus server can start to become infeasible: changes in recording/alerting rules and scrape jobs become more frequent (thus requiring reloads which, depending on the configured scrape intervals, can take up to a couple of minutes), missed scrapes can start to happen as Prometheus becomes overwhelmed, or the person or team responsible for that instance may simply become a bottleneck in terms of organizational process. When this happens, we need to rethink the architecture of our solution so that is scales accordingly. Thankfully, this is something the community has tackled time and time again, and so there are some recommendations on how to approach this problem. These recommendations revolve around sharding.

In this context, sharding means splitting...

Having a global view using federation

When you have multiple Prometheus servers, it can become quite cumbersome to know which one to query for a certain metric. Another problem that quickly comes up is how to aggregate data from multiple instances, possibly in multiple datacenters. Here's where federation comes into the picture. Federation allows you to have a Prometheus instance scraping selected time series from other instances, effectively serving as a higher-level aggregating instance. This can happen in a hierarchical fashion, with each layer aggregating metrics from lower-level instances into larger-encompassing time series, or in a cross-service pattern, where a few metrics are selected from instances in the same level for federation so that some recording and alerting rules become possible. For example, you could collect data for service throughput or latency in each...

Using Thanos to mitigate Prometheus shortcomings at scale

When you start to scale Prometheus, you quickly bump into the problem of cross-shard visibility. Indeed, Grafana can help, as you can add multiple datastore sources in the same dashboard panel, but this becomes harder to maintain, especially if multiple teams have different needs. Keeping track of which shard has which metric might not be trivial when there aren't clearly defined boundaries - while this might not be a problem when you have a shard per team as each team might only care about their own metrics, issues arise when there are several shards maintained by a single team and exposed as a service to the organization.

Additionally, it is common practice to run two identical Prometheus instances to prevent single points of failure (SPOF) in the alerting path - known as HA (or high-availability) pairs. This complicates...

Summary

In this chapter, we tackled issues concerning running Prometheus at scale. Even though a single Prometheus instance can get you a long way, it's a good idea to have the knowledge to grow if required. We've learned how vertical and horizontal sharding works, when to use sharding, and what benefits and concerns sharding brings. We were introduced to common patterns when federating Prometheus (hierarchical or cross-service), and how to choose between them depending on our requirements. Since, sometimes, we want more than the out-of-the-box federation, we were introduced to the Thanos project and how it solves the global view problem.

In the next chapter, we'll be tackling another common requirement and one that isn't a core concern of the Prometheus project, which is the long-term storage of time series.

Questions

When should you consider sharding Prometheus?
What's the difference between sharding vertically and horizontally?
Is there anything you can do before opting for a sharding strategy?
What type of metrics are best suited for being federated in a hierarchical pattern?
Why might you require cross-service federation?
What protocol is used between Thanos querier and sidecar?
If a replica label is not set in a Thanos querier that is configured with sidecars running alongside Prometheus HA pairs, what happens to the results of queries that are executed there?

Joel Bastos is an open source supporter and contributor, with a background in infrastructure security and automation. He is always striving for the standardization of processes, code maintainability, and code reusability. He has defined, led, and implemented critical, highly available, and fault-tolerant enterprise and web-scale infrastructures in several organizations, with Prometheus as the cornerstone. He has worked at two unicorn companies in Portugal and at one of the largest transaction-oriented gaming companies in the world. Previously, he has supported several governmental entities with projects such as the Public Key Infrastructure for the Portuguese citizen card. You can find his blogs at kintoandar and on Twitter with the handle @kintoandar.
Read more about Joel Bastos

Pedro Araújo

Pedro Arajo is a site reliability and automation engineer and has defined and implemented several standards for monitoring at scale. His contributions have been fundamental in connecting development teams to infrastructure. He is highly knowledgeable about infrastructure, but his passion is in the automation and management of large-scale, highly-transactional systems. Pedro has contributed to several open source projects, such as Riemann, OpenTSDB, Sensu, Prometheus, and Thanos. You can find him on Twitter with the handle @phcrva.
Read more about Pedro Araújo

Personalised recommendations for you

Based on your interests and search pattern

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

BookSep 2023288 pages

Expert C++

Discover advanced programming techniques, the latest features of C++17 and C++20, and best practices for memory management, debugging, testing, and large-scale application design with Expert C++. Ideal for experienced developers advancing to proficient programmers and building professional-grade C++ applications.

BookAug 2023604 pages

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

BookOct 2023604 pages4

Developer Career Masterplan

Written by industry experts that have spent the last 20+ years helping developers grow their career path towards senior developer positions and beyond. This book provides a comprehensive guide, sharing examples and stories from their global careers. By the end, you’ll have the knowledge to create a clear career progression plan as a technical professional.

BookSep 2023310 pages

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

BookNov 2023434 pages

Python Real-World Projects

Amplify your developer journey by curating a dynamic project portfolio that outshines traditional resumes. Delve into the Python realm through immersive projects, mastering core concepts while constructing comprehensive modules and applications. From data acquisition prowess to impactful data visualization, Python Real-World Projects arms you with essential skills to beat the competition.

BookSep 2023478 pages5

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

BookNov 2023386 pages

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Extending Microsoft Business Central with Power Platform

Extending Business Central with the Power Platform is a step-by-step guide for Business Central professionals to create solutions that automate business processes, explain complex workflow approvals, and integrate with hundreds of other systems, without traditional development. It’ll guide you in customizing Business Central with Power Platform.

BookAug 2023458 pages5

Quantum Computing Algorithms

The book emphasizes intuitive ideas behind quantum algorithms in ways that other books don’t cover, striking a careful balance between no math and too much math. To get the most from this book, you should be comfortable with basic algebra and writing simple computer code. No prior understanding of quantum physics is needed to get started.

BookSep 2023342 pages

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

Python – Complete Python, Django, Data Science and ML Guide

Unlock Python's full potential with this 50+ hour course! From programming to web and game development, data manipulation, and machine learning, gain the skills required to succeed in various Python-related careers. With practical tasks, hands-on experience, and a strong foundation in Python, you'll be ready to tackle real-world challenges and take advantage of the many opportunities this versatile language offers.

VideoNov 202350 hours 30 minutes5

You're reading from Hands-On Infrastructure Monitoring with Prometheus

Unlock this book and the full library FREE for 7 days

Authors (2)

C++ Programming for Linux Systems

This book covers the essential system programming tools and helps you explore the features of C++20. It emphasizes important details to maintain code quality and tackle everyday challenges of developing software for high performance, optimization, and more.

Expert C++

iOS 17 Programming for Beginners

iOS 17 Programming for Beginners, Eighth Edition is your comprehensive guide to learning the art of iOS app development. Whether you dream of creating the next chart-topping app or simply want to enhance your programming skills, this book is your trusted companion on this exciting journey.

Developer Career Masterplan

Refactoring with C#

In Refactoring with C#, you’ll explore the process of safely refactoring modern .NET code using Visual Studio features, advanced unit tests, AI assistance, and custom Roslyn analyzers.

Python Real-World Projects

The MVVM Pattern in .NET MAUI

The MVVM Pattern in .NET MAUI enables developers to master MVVM principles and effectively apply them to .NET MAUI. This book uses real-life examples and covers complex problems to help you successfully apply MVVM with .NET MAUI to confidently develop robust and high-performing cross-platform apps.

Extending Microsoft Business Central with Power Platform

Extending Microsoft Business Central with Power Platform

Quantum Computing Algorithms

Python – Complete Python, Django, Data Science and ML Guide

Python – Complete Python, Django, Data Science and ML Guide