You're reading from Driving Data Quality with Data Contracts

Product typeBook

Published inJun 2023

PublisherPackt

ISBN-139781837635009

Edition1st Edition

Concepts

Data Engineering

Author (1)

Andrew Jones

A Contract-Driven Data Architecture

In the previous chapter, we saw exactly what makes up a data contract. In this chapter, we’re going to build on that by looking at how we can use the data contract to drive our data architecture. We’ll introduce the concept of a contract-driven data architecture and show how powerful this can be. We believe this is a step-change in how we build data platforms, and we’ll discuss the many benefits we get when adopting this architecture pattern.

As part of that discussion, we’ll introduce the three principles that unlock those benefits: autonomy, guardrails, and consistency, and you’ll learn how those principles benefit the data generators, the data consumers, and the organization. To promote autonomy, we need to provide tooling that can be self-served by the data generators. We’ll finish this chapter by looking at why that is important and show an example of how to achieve it.

By the end of this chapter...

A step-change in building data platforms

To start this section, we’ll explain exactly what we mean by a contract-driven data architecture. We’ll explore how it is powered by using data contracts as the place to capture the metadata that describes the data, and we’ll see just how powerful it can be to create a contract-driven data architecture. We’ll show why we believe it is a step-change in building data platforms.

We’ll finish by walking through a case study from GoCardless, where we implemented a solution we thought was promoting autonomy but wasn’t as successful as we expected! What we learned from that greatly influenced our implementation of data contracts, where we have been much more successful in promoting autonomy through a self-serve interface.

We’ll explore the following topics in turn:

Building generic data tooling
Introducing a data infrastructure team
A case study from GoCardless in promoting autonomy...

Introducing the principles of a contract-driven data architecture

Building a contract-driven data architecture provides many benefits to both the data generators and consumers, and the wider organization. These benefits are achieved through these three principles:

Automation
Guidelines and guardrails
Consistency

Let’s look at each of these in turn.

Automation

There are several common tasks that need to be carried out on the data and the resources we use to manage it, no matter what that data is and who owns it. These tasks are great candidates to automate, reducing the effort the data generators need to spend managing the data.

The resources required for our data will almost always include the tables in the data warehouse. We can use the data contract to automate the creation and management of that table, for example, by creating the table when the contract is created and keeping the schema of the table in sync with the schema in the contract...

Providing self-served data infrastructure

Data generators must be able to create and manage their data products with agility and autonomy if we are going to improve the accessibility of quality data that leads to valuable business outcomes.

To enable that, the tooling implemented as part of our contract-driven architecture needs to be self-servable by those data generators. There should be no waiting on a central data or operations teams for review, slowing the data generators down and becoming a bottleneck.

We can be confident in allowing this because we have implemented the guidelines and guardrails that manage the risks, as we discussed in the previous section. That allows us to trust our data generators, and by showing we trust them we are promoting a sense of ownership of the data. That sense of ownership automatically translates into a feeling of responsibility and accountability for the data, and the data products they are providing.

As we’ve discussed throughout...

Summary

In this chapter, we introduced the concept of a contract-driven data architecture. This is an architecture driven by data contracts and the metadata we define within them. We showed how powerful this idea is, and why we believe it’s a step-change in how we build data platforms.

We use this pattern to build more generic data tooling, where instead of building similar pipelines as point solutions we can build tooling that doesn’t mandate anything about the data and how it is structured if we have enough context about the data, defined as metadata in the data contract. When adopting this pattern, it’s recommended to build a data infrastructure team, whose remit is to build this tooling for the adoption of all data generators, wherever they are in the organization.

To illustrate how this pattern is different from how we built platforms before, we walked through a case study of a previous service we implemented at GoCardless, the Data Platform Gateway...

Implementing Data Contracts at GoCardless: https://medium.com/gocardless-tech/implementing-data-contracts-at-gocardless-3b5c49074d13
3 Things Our Software Engineers Love About Data Contracts: https://medium.com/gocardless-tech/3-things-our-software-engineers-love-about-data-contracts-3106e1f1602d
The Data Engineer is dead, long live the (Data) Platform Engineer by Robert Sahlin: https://robertsahlin.substack.com/p/the-data-engineer-is-dead-long-live
Data-First Stack as an Enabler for Data Products by Animesh Kumar: https://moderndata101.substack.com/p/data-first-stack-as-an-enabler-for
Building Great Cloud Security Guardrails by Rich Mogull: https://devops.com/building-great-cloud-security-guardrails/
How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem by Gary Niemen https://engineering.atspotify.com/2020/08/how-we-use-golden...

The rest of the chapter is locked

You have been reading a chapter from

Driving Data Quality with Data Contracts

Published in: Jun 2023Publisher: PacktISBN-13: 9781837635009

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Driving Data Quality with Data Contracts

A Contract-Driven Data Architecture

A step-change in building data platforms

Introducing the principles of a contract-driven data architecture

Automation

Providing self-served data infrastructure

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook