Reader small image

You're reading from  Driving Data Quality with Data Contracts

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781837635009
Edition1st Edition
Right arrow
Author (1)
Andrew Jones
Andrew Jones
author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones

Right arrow

Introducing Data Contracts

In the previous chapter, we looked at the problems we need to solve, and why it requires a new kind of data architecture. In this chapter, we’ll introduce data contracts as our solution. We’ll provide a definition and explore exactly what it is and how it solves those problems.

One of the best analogies for data contracts is that they act as APIs for your data. That sounds simple, but it’s a fundamental change in how we build our data architecture. As we’ll see later in this chapter, by thinking about providing an API for data, you’ll start defining expectations around that API and consider the ownership and responsibilities. People often refer to an API as a contract between the provider and consumer, and it’s that idea that eventually led to me calling them data contracts.

But an API is just one example of an interface, and really, it’s interfaces that are the key to designing and implementing an architecture...

What is a data contract?

We’ll start by defining what a data contract is and break down that definition to explore the key principles that make up a data contract. Having an agreed definition will then allow us to understand how data contracts solve the problems we described in Chapter 1, A Brief History of Data Platforms, and give us the foundations we need in the later chapters as we look at exactly how to build and deploy an architecture built on data contracts – one that ultimately changes our data culture and allows us to extract the most business value from our data.

So, let’s start with a definition. I define a data contract as follows:

A data contract is an agreed interface between the generators of data and its consumers. It sets the expectations around that data, defines how it should be governed, and facilitates the explicit generation of quality data that meets the business requirements.

Those four keywords highlighted are the four key principles...

When to use data contracts

Now we have a good understanding of what data contracts are and how they solve the problems we saw in Chapter 1, A Brief History of Data Platforms, how do we know when is a good time to adopt data contracts in an organization?

Firstly, it depends on how your organization is using or wants to use its data. As discussed in the previous chapter, many organizations are starting to use data in more business-critical processes or in products they build for their customers. The ability to build these products quickly and effectively depends on the accessibility of easy-to-use, quality data, and data contracts help with the production of that data.

Then, once these data-driven applications are released, data contracts help ensure they stay performant and dependable by tracking the SLOs of the data and managing the evolution of that data, preventing breaking changes that impact downstream consumers.

Even if your organization is not ready to use data for critical...

Data contracts and the data mesh

Data mesh was invented by Zhamak Dehghani in 2019 (https://martinfowler.com/articles/data-monolith-to-mesh.html) and is a design pattern for building a domain-oriented, decentralized data platform. It focuses not just on the technology, but also the social and cultural changes required to achieve this goal and solve many of the problems we discussed in Chapter 1, A Brief History of Data Platforms.

The pattern is described through four principles:

  • Domain ownership
  • Data as a product
  • Self-serve data platform
  • Federated computational governance

Let’s go through each principle in turn and discuss how they relate to data contracts.

Domain ownership

Data mesh proposes a domain-oriented approach to organizing the responsibility and ownership of the data, where this ownership is decentralized to the business domains closest to the data – ideally, the data generators. They are the ones who know most about the data...

Summary

In this chapter, we introduced the concept of data contracts as our solution to the problems identified in Chapter 1, A Brief History of Data Platforms. We’ve provided a definition and discussed how data contracts provide an agreed interface between the generators of data and its consumers. That interface also sets the expectations around that data and how it should be governed. We then have everything we need to facilitate the explicit generation of quality data.

These are the four principles of data contracts, which work together to drive a step change in building reliable, trusted, and effective data platforms, and help us achieve our aim of increasing the value our organizations can get from our data.

By applying these principles, we are shifting the responsibilities left, moving more upstream to the data generators. We’re addressing the data quality and dependability issues at source, by those who have the most knowledge of the data and the ability...

Further reading

For more information on the topics covered in this chapter, please see the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Driving Data Quality with Data Contracts
Published in: Jun 2023Publisher: PacktISBN-13: 9781837635009
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones