You're reading from Mastering Prometheus

Product typeBook

Published inApr 2024

PublisherPackt

ISBN-139781805125662

Edition1st Edition

Concepts

DevOps

Author (1)

William Hegedus

Extending Prometheus Globally with Thanos

The remote storage systems that we covered in the previous chapter are not for everyone. Perhaps you are perfectly happy to run your Prometheus instances globally without any centralized place where you’re aggregating those metrics. However, it sure would be nice to have some way to run queries from a centralized place and fan them out to all of your Prometheus instances… Good news! Thanos can do that, and more!

Thanos is less of a pre-built, comprehensive solution the way VictoriaMetrics or Mimir are, and more of a Swiss Army knife of a la carte components that can be mixed and matched to fit your specific use case. At the time of writing, seven different components comprise the Thanos project and you can run as few or as many of them as you need.

In this chapter, we’ll cover them all in these main topics:

Overview of Thanos
Thanos Sidecar
Thanos Compactor
Thanos Query
Thanos Query Frontend...

Technical requirements

For this chapter, we’ll continue building off of the Prometheus environment we deployed to Kubernetes in Chapter 2. Consequently, you’ll need the two following tools installed:

kubectl: https://kubernetes.io/docs/tasks/tools/#kubectl
helm: https://helm.sh/docs/intro/install/

This chapter’s code is available at https://github.com/PacktPublishing/Mastering-Prometheus.

Overview of Thanos

The Thanos project began at Improbable and was spearheaded by two scions of the Prometheus ecosystem, Bartłomiej Płotka and Fabian Reinartz, who have both also been significant contributors to Prometheus itself and various other Prometheus-related projects. After a short period, the project was donated to the Cloud Native Computing Foundation (CNCF), where it is now designated as an “incubating” project.

The three stated goals of the Thanos project are as follows:

Global query view of metrics
Unlimited retention of metrics
High availability of components, including Prometheus

The core of Thanos (its original components) is comprised of Thanos Sidecar, Thanos Store, Thanos Compact, and Thanos Query. Each component is a subcommand of the thanos binary – no need to download and deploy separate executables for each of the components. Through each of its various components, Thanos enables distributed querying...

Thanos Sidecar

Thanos Sidecar is the most fundamental Thanos component, enabling Thanos’s two most popular features: querying multiple Prometheus instances from a centralized location (Thanos Query) and backing up Prometheus TSDB data in an object storage backend.

Thanos Sidecar works by running alongside the Prometheus server as a “sidecar” (clever naming, huh?). This is technically only a strict requirement if you’re using the sidecar to upload Prometheus data to object storage, but you should strive to deploy the Sidecar alongside your Prometheus instance, even if you don’t use that feature. Doing so will help minimize latency between the Sidecar and Prometheus.

The Sidecar fulfills its first job of enabling distributed querying of Prometheus instances by exposing a gRPC API that Thanos Query (or other Thanos components such as Ruler) can communicate with. This gRPC API (henceforth referred to as StoreAPI) is implemented and exposed by the...

Thanos Compactor

The Thanos Compactor component is responsible for compacting and downsampling TSDB blocks stored in our Object Storage provider. Since we’ve disabled local compaction of TSDB blocks on the Prometheus instance, we still need to compact them somehow to ensure efficient storage of our data. Hence, the Thanos project provides a component for compaction.

Thanos Compactor handles compaction in the same way that Prometheus does – it takes several small blocks and compacts their indices and samples to make a larger block with an index that uses less space than if all the composite blocks still maintained a separate index. This relies on the presupposition that most time series exist across multiple sequential blocks, which should almost always be the case.

There’s not much to note about how Thanos achieves this other than the requisite changes to account for the fact that the Compactor must download the blocks from object storage to compact them...

Thanos Query

Thanos Query is another of the most fundamental Thanos components. Without it, there’s not much point to Thanos Sidecar. It provides both a web UI and an API that are used to execute PromQL queries across multiple data sources (for example, Prometheus via Thanos Sidecar, metrics in object storage via Thanos Store, and so on).

The web UI will feel familiar to anyone who has used the Prometheus web UI since their functionality and UX are essentially equivalent. The query API is also 100% PromQL compliant and therefore can be used as a Prometheus-typed data source in Grafana. In practice at companies I’ve been at, we’ve even tended to use Thanos Query as our default data source in Grafana.

Thanos Query works by connecting to one or more endpoints that implement Thanos’s gRPC-based StoreAPI. Endpoints can be specified via the repeatable --endpoint flag. This flag supports both static definitions (for example, --endpoint=192.168.1.2:10901)...

Thanos Query Frontend

Thanos Query Frontend is a service that can be deployed in front of Thanos Query to improve query performance by splitting large-range queries into smaller ones and also caching results. It is based on a similar component implemented by Cortex (https://github.com/cortexproject/cortex), the predecessor to Mimir. You can think of it as a pre-processor of queries, where the majority of actual work is still done by the downstream queries.

Query sharding and splitting

Presuming you run multiple top-level Thanos Query instances, you can put Query Frontend in front of them to share the load between them more efficiently than simply load balancing between the two of them with something such as Nginx. This can be accomplished through query splitting based on time ranges and/or vertical sharding.

Query splitting

By default, the --query-range.split-interval flag is set to split range queries on a 24h interval. This means that if you query sum(my_metric) over...

Thanos Store

Thanos Store is perhaps the simplest Thanos component in terms of usage. There is not too much to tweak or tune and it does not require much in terms of resources. It is effectively stateless, although you can use persistent storage with it to reduce startup time as it populates metadata about the available blocks in object storage. We’ll get into some ways that you can horizontally scale it later, but for now, let’s just focus on what it is and how to deploy it.

Thanos Store is another component that implements the StoreAPI, so you can use Thanos Query to pull data from it. Thanos Store’s purpose is to function as a gateway to object storage, which is why you may sometimes see it referred to as the “Store Gateway.”

When Thanos Sidecar uploads blocks and when Thanos Compactor operates on them, they update the meta.json file within that block (see Chapter 3 for more information on that file) with a new thanos section. Thanos Store...

Thanos Ruler

Thanos Ruler enables a unique feature for advanced use cases: evaluating Prometheus rules across multiple Prometheus instances. For example, consider a service you have deployed across multiple regions with a Prometheus deployment in each region responsible for monitoring its corresponding instance of the service. Thanos Ruler would enable you to evaluate Prometheus rules across all of those Prometheus instances to obtain a more holistic view of your service. This is great for measuring things such as service-level objectives (SLOs).

Thanos Ruler accomplishes this by connecting to one or more Thanos Query endpoints to run queries against. If more than one is specified, it performs round-robin balancing of queries. In other words, a rule’s query is not evaluated by every specified Query instance – only one is chosen and used per query.

Data produced by evaluating recording rules is stored in a local TSDB in the same manner that it would be on Prometheus...

Thanos Receiver

Thanos Receiver (or “Receive”) is the last of our components to deploy and arguably has the potential to be the most complex Thanos component in your stack, depending on how you configure it. This is mostly because it is extremely configurable for multi-tenant and/or large-scale use cases. However, since it focuses primarily on receiving remote write data, we’ll skip diving too deep into the details of remote write since you’re already familiar with it from the previous chapter.

Like other Thanos components, Thanos Receive is also intended to connect to object storage to upload TSDB blocks. It maintains a local TSDB while receiving data but will upload blocks to object storage when they are flushed to disk every 2 hours. Unlike Thanos Ruler, it is also intended to maintain a local copy of data for a longer period – by default, 15 days (configurable via the --tsdb.retention flag).

A note on deduplication

As opposed to other...

Thanos tools

In addition to all of the Thanos components we have reviewed thus far, the Thanos CLI also has a thanos tools sub-command. This contains a variety of helpful tools, primarily for interacting with object storage buckets and the TSDB blocks within them. It also contains a command for validating recording and alerting rules used by Thanos Ruler.

Since these tools are primarily for use in existing, established environments, we won’t cover them individually in this book. Nevertheless, they may be worth experimenting with before cleaning up the Thanos components you’ve deployed in this chapter. Within any of the Thanos pods we deployed in this chapter, you can run thanos tools --help to see all of the available options.

Cleanup

Now that we’re done experimenting with the suite of Thanos components, you can clean up your environment by reverting to our simple Prometheus deployment via Helm, like so:

$ helm upgrade --namespace prometheus \
    --version 47.0.0 \
    --values mastering-prometheus/ch10/prom-values.yaml \
    mastering-prometheus \
    prometheus-community/kube-prometheus-stack

Additionally, you can delete all of the Thanos components and our object storage configuration via kubectl:

$ kubectl delete secret thanos-objstore-config
$ kubectl delete -f https://raw.githubusercontent.com/PacktPublishing/Mastering-Prometheus/main/ch10/manifests/thanos-compact.yaml
$ kubectl delete -f https://raw.githubusercontent.com/PacktPublishing/Mastering-Prometheus/main/ch10/manifests/thanos-query.yaml
$ kubectl delete -f https://raw.githubusercontent.com/PacktPublishing/Mastering-Prometheus/main/ch10/manifests...

Summary

In this chapter, we went through all of the Thanos components that are available to gain a greater understanding of the comprehensive suite of features offered by the Thanos project.

We learned how Thanos Sidecar enables long-term storage of metrics in object storage and distributed querying through Thanos Query, how Thanos Compactor operates on those uploaded TSDB blocks in object storage to compact and downsample them, and how Thanos Store retrieves them from object storage on-demand for queries.

We saw how Thanos Query enables distributed querying of metrics from the various components that implement Thanos’s gRPC StoreAPI, and how Thanos Query Frontend enables more efficient use of Thanos Query instances through caching, query sharding, and splitting.

We utilized Thanos Ruler so that we could evaluate Prometheus alerts and rules across all endpoints connected to a Thanos Query instance.

Finally, we learned about and deployed Thanos Receiver so that we...

Thanos’s gRPC StoreAPI: https://github.com/thanos-io/thanos/blob/9a5b4fa9c6e789cfc0c3bd648e183a6d4aea4218/pkg/store/storepb/rpc.proto
Thanos Sidecar: https://thanos.io/v0.32/components/sidecar.md
Thanos Compactor: https://kubernetes.io/docs/tasks/tools/#kubectl
Thanos DNS Discovery: https://thanos.io/v0.32/thanos/service-discovery.md/#dns-service-discovery
Thanos Query PromQL Engine:
- https://github.com/thanos-io/thanos/blob/main/docs/proposals-accepted/202301-distributed-query-execution.md
- https://github.com/thanos-io/promql-engine
Thanos Query Frontend: https://thanos.io/v0.32/components/query-frontend.md/
Handling instant queries: https://github.com/thanos-io/thanos/discussions/6472
Vertical sharding: https://thanos.io/v0.32/proposals-accepted/202205-vertical-query-sharding/
Thanos Store external label partitioning...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Prometheus

Published in: Apr 2024Publisher: PacktISBN-13: 9781805125662

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2

You're reading from Mastering Prometheus

Extending Prometheus Globally with Thanos

Technical requirements

Overview of Thanos

Thanos Sidecar

Thanos Compactor

Thanos Query

Thanos Query Frontend

Query sharding and splitting

Query splitting

Thanos Store

Thanos Ruler

Thanos Receiver

Thanos tools

Cleanup

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Designing and Implementing Microsoft Azure Networking Solutions

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

Zero Trust Overview and Playbook Introduction

The Self-Taught Cloud Computing Engineer

Technology Operating Models for Cloud and Edge

Azure Architecture Explained

Pentesting Active Directory and Windows-based Infrastructure

Practical Ansible

Windows 11 for Enterprise Administrators

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.