You're reading from Practical Data Quality

Product typeBook

Published inSep 2023

PublisherPackt

ISBN-139781804610787

Edition1st Edition

Concepts

Data Processing

Author (1)

Robert Hawker

Data Quality Rules

The chapters so far have been about understanding how to shape your data quality initiative – who should be consulted, how to win their support, and how to ensure you focus on the right areas.

Having used data discovery techniques in the previous chapter to identify critical data and identify its flaws, it is now time to define data quality rules. This moves the work into a critical phase, as the rules lead to a data quality score that, ultimately, people will judge an organization’s data against.

This chapter will help you write a clearly understandable business definition of a rule, which can then be converted into a programmatic check of data with a data quality tool. We will explore all the different features of a rule, such as rule thresholds, how they are assigned to data quality dimensions, assigning a monetary value to a rule failure, and weighting important rules over others.

In this chapter, we will cover the following topics:

...

An introduction to data quality rules

A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data, which is used heavily in Chapter 7.

Data quality rules always give a Boolean output – in other words, a row of data always passes or fails.

The following table provides a few (purposefully very simple) examples:

The key features of data quality rules

Now that data quality rules have been introduced, we will focus on the key features that must be taken into consideration when developing rules for the first time. The following diagram summarizes each of these features:

Figure 6.1 – A reference diagram for the key features of data quality rules

Each of these concepts needs to be considered when designing a data quality rule. It is important to understand the concepts well before starting the design process to avoid having to revisit every rule and retrofit them later on.

The remainder of this section will explain these concepts in depth and provide examples.

Rule weightings

Rule weightings are used to assign greater or lesser importance to certain rules. Greater weighting will be placed on critical rules. A data quality tool will use the provided weightings when calculating an overall data quality score, such as the following:

Business logic	Passed row example	Failed row example
The VAT number must be complete for all suppliers.	Any row with any character in this field would pass.	Any row which is “null” or “blank” would fail.
The VAT number...

...

Implementing data quality rules

The remainder of this chapter describes the end-to-end process of implementing data quality rules. This process is similar to any other IT implementation in that it has a design stage, a build stage, and a testing stage.

What is unique to data quality implementation work is the need to be ready to iterate. When a design is documented, you can feel that you have full confidence it is completely correct, and then in the build and test phases, you will find that the data requires additional subtleties in the rules that were previously unanticipated.

We will describe the implementation work in the following three sections.

Designing rules

The process of designing a data quality rule starts with the data discovery process outlined in Chapter 5. By the end of Chapter 5, we understood the business strategy and successfully linked it to the data that mattered. We profiled that data and learned about its values and patterns. The rule design phase...

Summary

In this chapter, we covered how to define useful data quality rules, with a tightly defined scope to avoid false positives. We also outlined all the key features of data quality rules in order to explain what information must be captured to document a useful data quality rule design.

We now understand the process that is required to design, develop, and test data quality rules and how good leadership can make a real difference in these technical phases of our work.

Now that we understand the end-to-end process of creating data quality rules, it is time to move on to how the results that these rules produce are presented to stakeholders.

The rest of the chapter is locked

You have been reading a chapter from

Practical Data Quality

Published in: Sep 2023Publisher: PacktISBN-13: 9781804610787

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages