Reader small image

You're reading from  Hands-On Infrastructure Monitoring with Prometheus

Product typeBook
Published inMay 2019
PublisherPackt
ISBN-139781789612349
Edition1st Edition
Right arrow
Authors (2):
Joel Bastos
Joel Bastos
author image
Joel Bastos

Joel Bastos is an open source supporter and contributor, with a background in infrastructure security and automation. He is always striving for the standardization of processes, code maintainability, and code reusability. He has defined, led, and implemented critical, highly available, and fault-tolerant enterprise and web-scale infrastructures in several organizations, with Prometheus as the cornerstone. He has worked at two unicorn companies in Portugal and at one of the largest transaction-oriented gaming companies in the world. Previously, he has supported several governmental entities with projects such as the Public Key Infrastructure for the Portuguese citizen card. You can find his blogs at kintoandar and on Twitter with the handle @kintoandar.
Read more about Joel Bastos

Pedro Araújo
Pedro Araújo
author image
Pedro Araújo

Pedro Arajo is a site reliability and automation engineer and has defined and implemented several standards for monitoring at scale. His contributions have been fundamental in connecting development teams to infrastructure. He is highly knowledgeable about infrastructure, but his passion is in the automation and management of large-scale, highly-transactional systems. Pedro has contributed to several open source projects, such as Riemann, OpenTSDB, Sensu, Prometheus, and Thanos. You can find him on Twitter with the handle @phcrva.
Read more about Pedro Araújo

View More author details
Right arrow

Integrating Long-Term Storage with Prometheus

The single-instance design of Prometheus makes it impractical to maintain large datasets of historical data, as it is limited by the amount of storage that's available locally. Having time series that span large periods allows seasonal trend analysis and capacity planning, and so, when the dataset doesn't fit into local storage, Prometheus provides this by pushing data to third-party clustered storage systems. In this chapter, we will look into remote read and write APIs, as well as shipping metrics for object storage with the help of Thanos. This will provide options on how to tackle this requirement, enabling several architecture choices.

In brief, the following topics will be covered in this chapter:

  • Test environment for this chapter
  • Remote write and remote read
  • Options for metrics storage
  • Thanos remote storage and ecosystem...

Test environment for this chapter

In this chapter, we'll be focusing on clustered storage. For this, we'll be deploying three instances to help simulate a scenario where Prometheus generates metrics and then we'll go through some options regarding how to store them on an object storage solution. This approach will allow us to not only explore the required configurations but also see how everything works together.

The setup we'll be using resembles the following diagram:

Figure 14.1: Test environment for this chapter

In the next section, we will explain how to get the test environment up and running.

Deployment

To launch a new test environment, move into this path, relative to the repository root, shown...

Remote write and remote read

Remote write and remote read allow Prometheus to push and pull samples, respectively: remote write is usually employed to implement remote storage strategies, while remote read allows PromQL queries to transparently target remote data. In the following topics, we'll go into each of these functionalities and present some examples of where they can be used.

Remote write

Remote write was a very sought after feature for Prometheus. It was first implemented as native support for sending samples in the openTSDB, InfluxDB, and Graphite data formats. However, a decision was soon made to not support each possible remote system but instead provide a generic write mechanism that's suitable for building...

Options for metrics storage

By default, Prometheus does a great job of managing local storage of metrics using its own TSDB. But there are cases where this is not enough: local storage is limited by the amount of disk space available locally to the Prometheus instance, which isn't ideal for large retention periods, such as years, and large data volumes that go beyond the amount of disk space that is feasible to have attached to the instance. In the following sections, we'll be discussing the local storage approach, as well as the currently available options for remote storage.

Local storage

Prometheus' out-of-the-box storage solution for time series data is simply local storage. It is simpler to understand and...

Thanos remote storage and ecosystem

In Chapter 13, Scaling and Federating Prometheus, we were introduced to Thanos, an open source project that was created to improve upon some of the shortcomings of Prometheus at scale. Specifically, we went through how Thanos solves having a global view of several Prometheus instances using the Thanos sidecar and querier components. It's now time to meet other Thanos components and explore how they work together to enable cheap long-term retention using object storage. Keep in mind that complexity will increase when going down this path, so validate your requirements and whether the global view approach and local storage aren't enough for your particular use case.

Thanos ecosystem

...

Summary

In this chapter, we were introduced to remote read and remote write endpoints. We learned how the recent remote write strategy using WAL is so important for the global performance and availability of Prometheus. Then, we explored some alternatives for keeping Prometheus local storage under control, while explaining the implications of opting for a long-term storage solution. Finally, we delved into Thanos, exposing some of its design decisions and introducing the complete ecosystem of components, providing practical examples showing how all the different pieces work together. With this, we can now build a long-term storage solution for Prometheus if we need to.

Questions

  1. What are the main advantages of a remote write based on WAL?
  2. How can you perform a backup of a running Prometheus server?
  3. Can the disk space of a Prometheus server be freed at runtime? If so, how?
  4. What are the main advantages of Thanos using object storage?
  5. Does it make sense to keep data in all available resolutions?
  6. What is the role of Thanos store?
  7. How can you inspect the data that's available in object storage using Thanos?

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Infrastructure Monitoring with Prometheus
Published in: May 2019Publisher: PacktISBN-13: 9781789612349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Joel Bastos

Joel Bastos is an open source supporter and contributor, with a background in infrastructure security and automation. He is always striving for the standardization of processes, code maintainability, and code reusability. He has defined, led, and implemented critical, highly available, and fault-tolerant enterprise and web-scale infrastructures in several organizations, with Prometheus as the cornerstone. He has worked at two unicorn companies in Portugal and at one of the largest transaction-oriented gaming companies in the world. Previously, he has supported several governmental entities with projects such as the Public Key Infrastructure for the Portuguese citizen card. You can find his blogs at kintoandar and on Twitter with the handle @kintoandar.
Read more about Joel Bastos

author image
Pedro Araújo

Pedro Arajo is a site reliability and automation engineer and has defined and implemented several standards for monitoring at scale. His contributions have been fundamental in connecting development teams to infrastructure. He is highly knowledgeable about infrastructure, but his passion is in the automation and management of large-scale, highly-transactional systems. Pedro has contributed to several open source projects, such as Riemann, OpenTSDB, Sensu, Prometheus, and Thanos. You can find him on Twitter with the handle @phcrva.
Read more about Pedro Araújo