Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Observability for Data Engineering

You're reading from  Data Observability for Data Engineering

Product type Book
Published in Dec 2023
Publisher Packt
ISBN-13 9781804616024
Pages 228 pages
Edition 1st Edition
Languages
Authors (2):
Michele Pinto Michele Pinto
Profile icon Michele Pinto
Sammy El Khammal Sammy El Khammal
Profile icon Sammy El Khammal
View More author details

Table of Contents (17) Chapters

Preface Part 1: Introduction to Data Observability
Chapter 1: Fundamentals of Data Quality Monitoring Chapter 2: Fundamentals of Data Observability Part 2: Implementing Data Observability
Chapter 3: Data Observability Techniques Chapter 4: Data Observability Elements Chapter 5: Defining Rules on Indicators Part 3: How to adopt Data Observability in your organization
Chapter 6: Root Cause Analysis Chapter 7: Optimizing Data Pipelines Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability Part 4: Appendix
Chapter 9: Data Observability Checklist Chapter 10: Pathway to Data Observability Index Other Books You May Enjoy

Defining Rules on Indicators

In the previous chapters, we saw how you could collect events synchronously in your data applications. We also discussed what contextual information you need in order to draw the big picture of what’s happening inside the applications.

Now that you have a lot of contextual information, it is high time to turn it into actionable insights. The metrics you collect during the pipeline execution need to reassure all the stakeholders about the proper execution of the data applications. All the observers of the pipeline need to be informed about how the data pipeline is behaving.

To maintain the trust of data producers and data consumers, we will introduce the concept of expectations, which will define what the engineer needs to achieve in order to keep the pipeline in good shape. These expectations, composed of metrics and rules, will act as sensors to know whether the applications are working as expected or not.

These rules are a key component...

Technical requirements

For this chapter, you would use the same Python environment created for Chapter 4: Data Observability Elements.

Determining SLOs

We have already described the SLA-SLO-SLI paradigm in Chapter 2, Fundamentals of Data Observability. The agreements are individual (implicit or explicit) contracts between a producer and a consumer that set the requirements that the produced data has to meet in order to be considered healthy. The objectives are targets that the producer must meet in order to fulfill their agreements. Finally, indicators are means of gauging whether the objectives are respected or not.

But first, let’s see where SLOs can be added.

Project versus data source SLOs

The objectives can be set at different levels, depending on the needs of the stakeholders. Indeed, some objectives can be set at the project or pipeline level, and others at the data source level. You can see this at the micro and macroscopic level. Depending on the scope of responsibility you have, you may need to set the objective at one level or the other.

At the microscopic level, the objectives set on...

Turning SLOs into rules

In this section, we will see how objectives can be turned into actionable rules by creating contextual checkpoints from the pipeline or externally. At the start of any rule is the expectation, which can be defined as "What does the consumer expect from the dataset?"

An expectation formalizes the objective into a rule and the corresponding metric to be tracked. The expectation is then a good way to document the objectives and the metrics needed to respect them. The two components of the expectation have their importance: the rule tells the observer how the data should behave, and the metric is used to detect whether the behavior is deviant or not.

Let’s look at the different types of rules that we can set.

Different types of rules

The backbone of a rule is the indicator. Based on this, a rule can be set and will start checking how the metric is behaving. These rules are often guided by the principles of data quality discussed in Chapter...

Project – continuous validation of the data

Now that we have learned how to define the SLOs of our projects and how to transform these SLOs into rules, it is time to learn how to integrate these rules into a CI/CD process and how to implement an end-to-end data validation pipeline.

Concepts of CI/CD

For several years now, software development has adopted a set of best practices called CI/CD, which is aimed at eliminating the distance that exists between development and operations activities. This objective is mainly realized by forcing teams to automate the building, testing, and deployment phases of applications.

The acronym CI/CD stands for continuous integration (CI) and continuous delivery (CD), or in some cases, continuous deployment (CD). Before introducing the concept of continuous data validation, it is important to understand these concepts in detail.

In Figure 5.2, we can see a graphical representation of the main stages of these processes:

...

Summary

In this chapter, we learned how to define SLOs and how this can be done at different levels of abstraction, depending on the purpose, and analyzed the methods you can use to define your SLOs at the data source and project level.

Then, we learned how to turn our SLOs into actionable rules by defining and creating expectations that form the backbone of our rules.

By studying parts of the code, we have understood the different types of rules and their concrete implementation, as well as the concept of circuit-breakers.

In the last section of the chapter, we introduced the concept of continuous integration and continuous delivery to implement a positive and automated cycle of data validation.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Data Observability for Data Engineering
Published in: Dec 2023 Publisher: Packt ISBN-13: 9781804616024
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}