Reader small image

You're reading from  Driving Data Quality with Data Contracts

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781837635009
Edition1st Edition
Right arrow
Author (1)
Andrew Jones
Andrew Jones
author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones

Right arrow

A Contract-Driven Data Architecture

In the previous chapter, we saw exactly what makes up a data contract. In this chapter, we’re going to build on that by looking at how we can use the data contract to drive our data architecture. We’ll introduce the concept of a contract-driven data architecture and show how powerful this can be. We believe this is a step-change in how we build data platforms, and we’ll discuss the many benefits we get when adopting this architecture pattern.

As part of that discussion, we’ll introduce the three principles that unlock those benefits: autonomy, guardrails, and consistency, and you’ll learn how those principles benefit the data generators, the data consumers, and the organization. To promote autonomy, we need to provide tooling that can be self-served by the data generators. We’ll finish this chapter by looking at why that is important and show an example of how to achieve it.

By the end of this chapter...

A step-change in building data platforms

To start this section, we’ll explain exactly what we mean by a contract-driven data architecture. We’ll explore how it is powered by using data contracts as the place to capture the metadata that describes the data, and we’ll see just how powerful it can be to create a contract-driven data architecture. We’ll show why we believe it is a step-change in building data platforms.

We’ll finish by walking through a case study from GoCardless, where we implemented a solution we thought was promoting autonomy but wasn’t as successful as we expected! What we learned from that greatly influenced our implementation of data contracts, where we have been much more successful in promoting autonomy through a self-serve interface.

We’ll explore the following topics in turn:

  • Building generic data tooling
  • Introducing a data infrastructure team
  • A case study from GoCardless in promoting autonomy...

Introducing the principles of a contract-driven data architecture

Building a contract-driven data architecture provides many benefits to both the data generators and consumers, and the wider organization. These benefits are achieved through these three principles:

  • Automation
  • Guidelines and guardrails
  • Consistency

Let’s look at each of these in turn.

Automation

There are several common tasks that need to be carried out on the data and the resources we use to manage it, no matter what that data is and who owns it. These tasks are great candidates to automate, reducing the effort the data generators need to spend managing the data.

The resources required for our data will almost always include the tables in the data warehouse. We can use the data contract to automate the creation and management of that table, for example, by creating the table when the contract is created and keeping the schema of the table in sync with the schema in the contract...

Providing self-served data infrastructure

Data generators must be able to create and manage their data products with agility and autonomy if we are going to improve the accessibility of quality data that leads to valuable business outcomes.

To enable that, the tooling implemented as part of our contract-driven architecture needs to be self-servable by those data generators. There should be no waiting on a central data or operations teams for review, slowing the data generators down and becoming a bottleneck.

We can be confident in allowing this because we have implemented the guidelines and guardrails that manage the risks, as we discussed in the previous section. That allows us to trust our data generators, and by showing we trust them we are promoting a sense of ownership of the data. That sense of ownership automatically translates into a feeling of responsibility and accountability for the data, and the data products they are providing.

As we’ve discussed throughout...

Summary

In this chapter, we introduced the concept of a contract-driven data architecture. This is an architecture driven by data contracts and the metadata we define within them. We showed how powerful this idea is, and why we believe it’s a step-change in how we build data platforms.

We use this pattern to build more generic data tooling, where instead of building similar pipelines as point solutions we can build tooling that doesn’t mandate anything about the data and how it is structured if we have enough context about the data, defined as metadata in the data contract. When adopting this pattern, it’s recommended to build a data infrastructure team, whose remit is to build this tooling for the adoption of all data generators, wherever they are in the organization.

To illustrate how this pattern is different from how we built platforms before, we walked through a case study of a previous service we implemented at GoCardless, the Data Platform Gateway...

Further reading

For more information on the topics covered in this chapter, please see the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Driving Data Quality with Data Contracts
Published in: Jun 2023Publisher: PacktISBN-13: 9781837635009
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones