Reader small image

You're reading from  Driving Data Quality with Data Contracts

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781837635009
Edition1st Edition
Right arrow
Author (1)
Andrew Jones
Andrew Jones
author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones

Right arrow

Bringing Data Consumers and Generators Closer Together

In this chapter, we’ll look at the importance of bringing data consumers and generators closer together and why that is one of the key objectives when adopting data contracts. It’s only by being clear on the roles and responsibilities that these two groups of people can work effectively and efficiently to realize our goal of extracting the most business value from our data. Therefore, we’ll start by defining these roles and what each of those roles expect from the other, before being explicit about the responsibilities and accountabilities of each role.

Next, we’ll discuss a data consumer that is often overlooked – the product engineering teams. They are maybe the most important consumers in your organization and yet, like other consumers, are often unable to rely on the data that is generated. This leads to unreliable services or the inability to use the valuable data generated in other parts...

Who is a consumer, and who is a generator?

We’ve spoken a lot about the consumers and generators of data in this book, but what exactly do people in these roles do? What do they care about? What are their requirements, and what are their expectations?

In the following subsections, we’ll look at both roles in more detail, starting with the data consumers.

Data consumers

A data consumer is a person, a team, or a service that consumes data to inform and/or take some action. Typically, we think of data consumers as a data practitioner – for example, a data engineer, a BI analyst, or a data scientist. Their primary tasks require them to consume and work with data, and as such, they are highly reliant on the quality and dependability of that data.

However, they are not the only data consumers in your organization. There are an increasing number of people who are not data practitioners but are data literate. They are comfortable using a data analysis tool...

Assigning responsibility and accountability

Now that we have defined the roles, we need to specify the responsibilities and accountabilities of each role. This ensures that everyone knows what is expected of them and allows them to work most effectively together.

We’ll start with the data generators. As we discussed in the previous section, many of them didn’t realize they were data generators. Therefore, those responsibilities were taken by the data engineering team who built the pipelines that extracted the raw data from upstream services.

This data engineering team became accountable for the reliability of the data, even though they were not involved in how it was generated or how the structure of the data evolved. This team was very reactive to upstream changes and did their best to try and limit the impact of those changes. However, there is only so much they can do, and there’s no quick fix you can deploy if the generator upstream suddenly stops writing...

Feeding data back to the product teams

As mentioned earlier in this chapter in the Who is a consumer, and who is a generator? section, although we often think of a data consumer as a data practitioner (for example, a BI analyst or a data scientist), they’re not the only ones who consume data. In fact, product teams, and the services they create, are perhaps the largest and most important data consumers in your organization.

These services do not exist in isolation. They all take some data as input, perform some process or take some action, and return new data as output. And when it comes to input data, they have the same expectations as any other data consumer. They need to understand what data is available, how it is structured, and any other context around the data. They need to know how dependable, correct, and available that data is. They need to know where that data comes from, who owns it, and the support levels being provided.

Often, these services make data available...

Managing the evolution of data

Data evolves over time, just as your organization does, and we’ll need to manage that appropriately in order to minimize the impact of that evolution on downstream users – particularly the most critical use cases. However, just like your organization, your core models and data products will also be stable over many years.

You can see that reflected in the public APIs, for those that have them, and how little they change over time. There’s little reason why our internal data products should change much more frequently than those if we build them with the same discipline and a product mindset.

Given that, it’s fine for there to be some friction when it comes to evolving our data contracts. In fact, this friction is desirable. By having some friction here, we’re signifying the importance of the data contract and the commitment we make to its maintenance and stability over the long term.

How much friction there...

Summary

In this chapter, we clearly defined the different roles of the data consumer and the data generator, as well as what each expects from the other. We also went into detail on the responsibilities and accountabilities of each role. It’s by defining these roles and responsibilities that we enable these groups of people to work together closely and effectively, with the knowledge of what is expected of them.

We use data contracts to provide a clear understanding of responsibility and ownership for each of those roles. And it’s by bringing these roles closer together that we improve the accessibility and quality of our data, along with the business value we can generate from it.

Data generators need to feel a sense of ownership over those outcomes if they are to be incentivized to provide data that consumers can build on with confidence. They get that from data consumers, who can share what they need and why they need it.

These consumers include the product...

Further reading

For more information on the topics covered in this chapter, please see the following resources:

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Driving Data Quality with Data Contracts
Published in: Jun 2023Publisher: PacktISBN-13: 9781837635009
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrew Jones

Andrew Jones is a principal engineer at GoCardless, one of Europe's leading Fintech's. He has over 15 years experience in the industry, with the first half primarily as a software engineer, before he moved into the data infrastructure and data engineering space. Joining GoCardless as its first data engineer, he led his team to build their data platform from scratch. After initially following a typical data architecture and getting frustrated with facing the same old challenges he'd faced for years, he started thinking there must be a better way, which led to him coining and defining the ideas around data contracts. Andrew is a regular speaker and writer, and he is passionate about helping organizations get maximum value from data.
Read more about Andrew Jones