Reader small image

You're reading from  Practical Data Quality

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781804610787
Edition1st Edition
Right arrow
Author (1)
Robert Hawker
Robert Hawker
author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Right arrow

Monitoring Data Against Rules

After all the hard work prioritizing and collecting data quality rules, monitoring starts to provide the desired payoff.

Monitoring is about organizing the rules you have developed into a set of reports and dashboards that help an organization take action.

Until this point, you have probably only seen your data quality rules in action against test data. This is the point where you will finally judge your data against the rules that you have established and you will see for the first time where the gaps are.

This can bring up conflicted feelings. As a data quality professional, you are hoping that there will be gaps in the data identified by the rules. If you do all this hard work and then find there are few or only inconsequential gaps, then you will have some explaining to do (this has never happened in all my experience!). However, if you are invested in the organization, seeing these gaps can be quite worrying. What is important to remember...

Introduction to data quality reporting

Data quality reporting should provide an entire hierarchy of reporting – from a high-level summary, down to the individual rows of failed data. These different levels of reporting are aimed at different stakeholders of varied seniority.

This is to cover the diverse requirements of different users of the reporting. For example, a list of failed records is very useful for an operational person who has been asked to make corrections, but would not serve a Chief Data Officer very well. A level of aggregation is required for a senior stakeholder so that they can see an overall picture of the data in the area(s) that they are responsible for.

This section will outline the types of reporting required, who they are aimed at, and how they might look.

Different levels of reporting

In my experience, there are three main levels of reporting required in a data quality initiative. These are mentioned in the following table:

...

Designing a high-level data quality dashboard

Every data quality initiative is different, and senior stakeholders at different organizations will have different needs. The example developed for this book is an amalgamation of various concepts successfully applied at different organizations. The figures in this section can be used as a starting point for discussions, but it is critical to get stakeholders involved in the design process.

This section explains the typical design of the various Data Quality Dashboards and reports. It should be possible to apply organizational differences to this typical approach to make it work for your organization.

Dimensions and filters

The high-level Data Quality Dashboard for the senior stakeholder is typically a simple data visualization aimed at showing a data quality summary for the following displayed dimensions:

  • Each process area
  • Each data object
  • Each business unit
  • Each region

The report is typically kept simple...

Designing a Rule Results Report

The Rule Results Report helps an organization move from a high-level aggregated trend to a more specific list of data quality issues. The report allows conclusions to be reached. For example, if there is a large backlog of supplier payments in a particular country, and all the bank-related data quality rules for that country show low scores, then the issues can be easily explained, and an action plan put in place to get to a resolution.

As with the data quality summary, it is important to take examples of report layouts to the data stewards and ask for their input. Your particular organization may have different needs, and to ensure good engagement, these must be taken into account.

This section provides a view of the features that have been commonly required for the Rule Results Report at a range of different organizations across different industries.

Typical features of the Rule Results Report

The overall objective for this report is to...

Designing Failed Data Reports

The Failed Data Report is the final level of detail for the data quality reporting. It is intended to be a completely actionable list of records that need correction. This report does not tend to vary much between organizations – it is a simple list of record-level details.

The following sections will provide details of the report, including its features, elements, and the benefits that they provide.

Typical features of the Failed Data Reports

The Failed Data Reports should provide enough detail to do the following:

  • Quickly and easily identify the record
  • Clearly show the data issue for each record
  • Provide as much assistance as possible to the user in correcting the issue

As already mentioned, the report is typically accessed by drilling through from the Rule Results Report. The user will select a row of the Rule Results Report and click the number of failed records to see the Failed Data Report. It does not have to...

Managing inactive and duplicate data

One key aspect of data quality not mentioned in this chapter so far is the management of inactive and duplicate records. The best organizations from a data governance perspective have a clear policy to identify and remove records that are no longer actively being used for transactions in the organization or are potentially duplicated.

However, in reality, these organizations represent just the top few percent. Most organizations are not good at this or are only good at this where they see the greatest risk. For example, a business in a heavily regulated industry might archive production records as soon as they can according to regulations to avoid future inspections identifying flaws originating before the regulatory period.

Managing duplicate and inactive data is a critical part of data quality management. I will explain how managing this properly can reduce the workload of remediation and avoid focusing on old, unused data.

Managing inactive...

Presenting findings to stakeholders

Having worked hard to produce these data quality reports, it is critical to ensure that they are launched successfully and embedded into day-to-day business practices.

Launching data quality reporting successfully

The most important feature of a successful data quality reporting launch is that the rules must be accurate and well tested. If the reporting goes live and immediately is shown to include false data quality failures, it is very difficult to keep people engaged.

Usually through a data quality initiative, you will identify data stewards and data producers who are highly engaged and become almost part of the central team. It is a good practice to release the reporting initially to these users, asking them to monitor the outputs regularly for 1-2 weeks until the confidence level is higher.

These users can often then act as champions of the tool when it is released more widely to the business.

It is also highly recommended to...

Summary

This chapter has outlined an example suite of monitoring reports for data quality. This may need to be adapted to your organization but should at least provide an accelerator for the design discussions.

The reports allow anyone in the organization who is given access to them to see an overall picture of data quality and how it is trending over time. They allow people of all different levels to drill right down to record level if they wish. They provide a clear and measurable guide to how data quality really is in the organization.

Of course, they are only as good as the rules that are input into them. Assuming the rules are of appropriate quality, the Failed Data Reports provide an actionable to-do list of data that must be remediated.

The next chapter is all about taking this actionable list and cleaning up the data. This activity will start to reap the benefits that you worked hard to identify when implementing the approach from Chapter 3.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Data Quality
Published in: Sep 2023Publisher: PacktISBN-13: 9781804610787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker