Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Cloud-Native Observability with OpenTelemetry

You're reading from  Cloud-Native Observability with OpenTelemetry

Product type Book
Published in May 2022
Publisher Packt
ISBN-13 9781801077705
Pages 386 pages
Edition 1st Edition
Languages
Author (1):
Alex Boten Alex Boten
Profile icon Alex Boten

Table of Contents (17) Chapters

Preface 1. Section 1: The Basics
2. Chapter 1: The History and Concepts of Observability 3. Chapter 2: OpenTelemetry Signals – Traces, Metrics, and Logs 4. Chapter 3: Auto-Instrumentation 5. Section 2: Instrumenting an Application
6. Chapter 4: Distributed Tracing – Tracing Code Execution 7. Chapter 5: Metrics – Recording Measurements 8. Chapter 6: Logging – Capturing Events 9. Chapter 7: Instrumentation Libraries 10. Section 3: Using Telemetry Data
11. Chapter 8: OpenTelemetry Collector 12. Chapter 9: Deploying the Collector 13. Chapter 10: Configuring Backends 14. Chapter 11: Diagnosing Problems 15. Chapter 12: Sampling 16. Other Books You May Enjoy

Chapter 12: Sampling

One of the challenges of telemetry, in general, is managing the quantity of data that can be produced by instrumentation. This can be problematic at the time of generation if the tools producing telemetry consume too many resources. It can also be costly to transfer the data across various points of the network. And, of course, the more data is produced, the more storage it consumes, and the more resources are required to sift through it at the time of analysis. The last topic we'll discuss in this book focuses on how we can reduce the amount of data produced by instrumentation while retaining the value and fidelity of the data. To achieve this, we will be looking at sampling. Although primarily a concern of tracing, sampling has an impact across metrics and logs as well, which we'll learn about throughout this chapter. We'll look at the following areas:

  • Concepts of sampling, including sampling strategies, across the different signals of...

Technical requirements

All the code for the examples in the chapter is available in the companion repository, which can be downloaded using git with the following command. The examples are under the chapter12 directory:

$ git clone https://github.com/PacktPublishing/Cloud-Native-Observability
$ cd Cloud-Native-Observability/chapter12

The first example in the chapter consists of an example application that uses the OpenTelemetry Python SDK to configure a sampler. To run the code, we'll need Python 3.6 or greater installed:

$ python --version
Python 3.8.9
$ python3 --version
Python 3.8.9

If Python is not installed on your system, or the installed version of Python is less than the supported version, follow the instructions from the Python website (https://www.python.org/downloads/) to install a compatible version.

Next, install the following OpenTelemetry packages via pip. Note that through dependency requirements, additional packages will automatically be installed...

Concepts of sampling across signals

A method often used in the domain of research, the process of sampling selects a subset of data points across a larger dataset to reduce the amount of data to be analyzed. This can be done because either analyzing the entire dataset would be impossible, or unnecessary to achieve the research goal, or because it would be impractical to do so. For example, if we wanted to record how many doors on average each car in a store parking lot has, it may be possible to go through the entire parking lot and record the data in its entirety. However, if the parking lot contains 20,000 cars, it may be best to select a sample of those cars, say 2,000, and analyze that instead. There are many sampling methods used to ensure that a representational subset of the data is selected, to ensure the meaning of the data is not lost because of the sampling.

Methods for sampling can be grouped as either of the following:

Sampling at the application level via the SDK

Allowing applications to decide what to sample, provides a great amount of flexibility to application developers and operators, as these applications are the source of the tracing data. Samplers can be configured in OpenTelemetry as a property of the tracer provider. In the following code, a configure_tracer method configures the OpenTelemetry tracing pipeline and receives Sampler as a method argument. This method is used to obtain three different tracers, each with its own sampling configuration:

  • ALWAYS_ON: A sampler that always samples.
  • ALWAYS_OFF: A sampler that never samples.
  • TraceIdRatioBased: A probability sampler, which in the example is configured to sample traces 50% of the time.

The code then produces a separate trace using each tracer to demonstrate how sampling impacts the output generated by ConsoleSpanExporter:

sample.py

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk...

Using the OpenTelemetry Collector to sample data

Configuring the application to sample traces is great, but what if we wanted to use tail sampling instead? The OpenTelemetry Collector provides a natural point where sampling can be performed. Today, it supports both tail sampling and probabilistic sampling via processors. As we've already discussed the probabilistic sampling processor in Chapter 8, The OpenTelemetry Collector, we'll focus this section on the tail sampling processor.

Tail sampling processor

In addition to supporting the configuration of sampling via specifying a probabilistic sampling percentage, the tail sampling processor can make sampling decisions based on a variety of characteristics of a trace. It can choose to sample based on one of the following:

  • Overall trace duration
  • Span attributes' values
  • Status code of a span

To accomplish this, the tail sampling processor supports the configuration of policies to sample traces...

Summary

Understanding the different options for sampling provides us with the ability to manage the amount of data produced by our applications. Knowing the trade-offs of different sampling strategies and some of the methods available helps decrease the level of noise in a busy environment.

The OpenTelemetry configuration and samplers available to configure sampling at the application level can help reduce the load and cost upfront in systems via head sampling. Configuring tail sampling at collection time provides the added benefit of making a more informed decision on what to keep or discard. This benefit comes at the added cost of having to run a collection point with sufficient resources to buffer the data until a decision can be reached.

Ultimately, the decisions made when configuring sampling will impact what data is available to observe what is happening in a system. Sample too little and you may miss important events. Sample too much and the cost of producing telemetry...

Why subscribe?

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
  • Improve your learning with Skill Plans built especially for you
  • Get a free eBook or video every month
  • Fully searchable for easy access to vital information
  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Cloud-Native Observability with OpenTelemetry
Published in: May 2022 Publisher: Packt ISBN-13: 9781801077705
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}