Reader small image

You're reading from  Practical Data Quality

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781804610787
Edition1st Edition
Right arrow
Author (1)
Robert Hawker
Robert Hawker
author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Right arrow

Data Quality Rules

The chapters so far have been about understanding how to shape your data quality initiative – who should be consulted, how to win their support, and how to ensure you focus on the right areas.

Having used data discovery techniques in the previous chapter to identify critical data and identify its flaws, it is now time to define data quality rules. This moves the work into a critical phase, as the rules lead to a data quality score that, ultimately, people will judge an organization’s data against.

This chapter will help you write a clearly understandable business definition of a rule, which can then be converted into a programmatic check of data with a data quality tool. We will explore all the different features of a rule, such as rule thresholds, how they are assigned to data quality dimensions, assigning a monetary value to a rule failure, and weighting important rules over others.

In this chapter, we will cover the following topics:

...

An introduction to data quality rules

A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data, which is used heavily in Chapter 7.

Data quality rules always give a Boolean output – in other words, a row of data always passes or fails.

The following table provides a few (purposefully very simple) examples:

The key features of data quality rules

Now that data quality rules have been introduced, we will focus on the key features that must be taken into consideration when developing rules for the first time. The following diagram summarizes each of these features:

Figure 6.1 – A reference diagram for the key features of data quality rules

Figure 6.1 – A reference diagram for the key features of data quality rules

Each of these concepts needs to be considered when designing a data quality rule. It is important to understand the concepts well before starting the design process to avoid having to revisit every rule and retrofit them later on.

The remainder of this section will explain these concepts in depth and provide examples.

Rule weightings

Rule weightings are used to assign greater or lesser importance to certain rules. Greater weighting will be placed on critical rules. A data quality tool will use the provided weightings when calculating an overall data quality score, such as the following:

Business logic

Passed row example

Failed row example

The VAT number must be complete for all suppliers.

Any row with any character in this field would pass.

Any row which is “null” or “blank” would fail.

The VAT number...

...

Implementing data quality rules

The remainder of this chapter describes the end-to-end process of implementing data quality rules. This process is similar to any other IT implementation in that it has a design stage, a build stage, and a testing stage.

What is unique to data quality implementation work is the need to be ready to iterate. When a design is documented, you can feel that you have full confidence it is completely correct, and then in the build and test phases, you will find that the data requires additional subtleties in the rules that were previously unanticipated.

We will describe the implementation work in the following three sections.

Designing rules

The process of designing a data quality rule starts with the data discovery process outlined in Chapter 5. By the end of Chapter 5, we understood the business strategy and successfully linked it to the data that mattered. We profiled that data and learned about its values and patterns. The rule design phase...

Summary

In this chapter, we covered how to define useful data quality rules, with a tightly defined scope to avoid false positives. We also outlined all the key features of data quality rules in order to explain what information must be captured to document a useful data quality rule design.

We now understand the process that is required to design, develop, and test data quality rules and how good leadership can make a real difference in these technical phases of our work.

Now that we understand the end-to-end process of creating data quality rules, it is time to move on to how the results that these rules produce are presented to stakeholders.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Data Quality
Published in: Sep 2023Publisher: PacktISBN-13: 9781804610787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker