Reader small image

You're reading from  Practical Data Quality

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781804610787
Edition1st Edition
Right arrow
Author (1)
Robert Hawker
Robert Hawker
author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Right arrow

Data Quality Remediation

In the previous chapter, we described how to set up data quality reporting, which allows you to easily identify bad data. This chapter moves on to correcting the data. As explained back in Chapter 1, this does not mean that the organization should aim for perfect data. The aim should be to get the data to the level where it no longer causes significant impediments to the organization achieving its goals.

This is often seen as the most challenging part of the data quality initiative. There is typically a major resource investment and a long lead time to make progress.

In spite of these challenges, this phase is also an exciting one. This is where the organization starts to see the tangible benefits that we attempted to estimate back in Chapter 3. As the bad data is replaced with correct data, the issues experienced prior to the initiative finally start to reduce in severity and impact.

Processes become more efficient, resource challenges driven by poor...

Overall remediation process

The overall process of remediation is typically cyclical in nature. It is usually not possible to work on all the issues uncovered by data quality reporting at the same time. The remediation is usually handled in tranches.

Our process for remediation has the following steps:

Figure 8.1 – End-to-end process of remediation

Figure 8.1 – End-to-end process of remediation

The following table provides a more detailed description of each step:

Prioritizing remediation activities

When you first run your data quality Rule Results Report (or your equivalent), it may be a little overwhelming. There will be failed records for every rule and sometimes the failed records may add up to many thousands. It is not uncommon in larger businesses for 250,000 or more records to fail a rule. For example, if a fast-moving consumer goods organization has a reward card scheme, it can easily have millions of customers. One of the largest of these schemes in the UK has 18 million customers. It would only take a single missing validation on an online enrollment form to generate large quantities of failed data as customers make mistakes when entering data. One organization we worked with required the date of birth of the customer, but did not validate what was entered. Around 1% of customers entered the correct day and month of birth but accidentally entered the current year instead of their birth year. The form was missing a simple validation...

Identifying the approach to remediation

Now that the priorities are understood, it is time to work on the approach to remediating the bad data. There are a number of different approaches that can be applied and the effort involved varies hugely.

Typically, each prioritized rule can be categorized into a particular approach. Most often, only one approach will apply to each issue. Sometimes there might be the possibility to apply two or more approaches to a particular issue.

For example, if supplier email addresses are missing in the ERP system to send remittance advice details, three approaches might apply:

  1. The data might be in another system (for example, a contract management system) for 40% of the vendors who are missing the data. For these, the data would be migrated across to the ERP system in a batch.
  2. The data might be available on previous supplier invoices for a further 40% of the vendors and could be collected and keyed in.
  3. The data might have to be collected...

Moving remediation to business as usual

In cases where an automated or mass correction approach is applied, often it does not correct all of the data. There may be a difficult 20% of bad data that cannot be automatically matched and where a second approach has to be implemented. Often, difficult decisions need to be made on how far to go in correcting the data. For example, that last 20% might use a manual remediation approach such as 6 or 7. That might be so time-consuming that the cost of implementing it exceeds the benefit. In these situations, it may be most appropriate to apply the approach that gives 80% value and accept (temporarily at least!) the remaining data quality challenge. A “business as usual” remediation method could be applied for the remaining 20%.

To make this a bit clearer, here are further details on the real example in Table 8.5 where supplier bank details were missing:

  • An organization’s ERP system found 65% of its suppliers were...

Understanding the effort and cost

Once the approach to each prioritized data quality issue has been identified, an approximate effort and cost estimate should be prepared, along with a timescale and plan for each issue.

  • Sometimes it may be necessary to re-visit the prioritization at this point. If any of the issues will be exceptionally difficult to resolve, then it might be better to prioritize a different issue with a simpler resolution. This typically happens in the following situations:
  • The approach selected is very manual and will consume more resources than are feasibly available

An approach involving a third party (that is, paying for correct data) is more expensive than initially anticipated

Momentum is important in data quality initiatives. If an issue is problematic, even where the priority is high, it can be better to move on to an issue that can be progressed efficiently.

In order to properly understand the effort and costs involved in remediating...

Governing remediation activities

Once the prioritization is complete, the approach has been identified, and the effort involved has been understood, the remediation activities begin. Just as with any other project-style activity, remediation must be governed.

Governance in this instance means the following:

  • Tracking the remediation activities against the expected effort/elapsed time
  • Reporting to senior leaders on the progress of the activity
  • Understanding risks, issues, and “blockers” that need to be managed or mitigated
  • Ensuring that when the project activity is done, ongoing work is transitioned into a business-as-usual process

When organizations start to remediate data quality issues for the first time, it has to be managed quite formally. This is simply because the organization has no established processes, best practices, and institutional knowledge in this area. Some organizations that I have worked with have decided to simply assign...

Tracking benefits

Remediation activities are very time-consuming and challenging. It is very common for data quality initiatives to be so focused on this activity that they do not manage stakeholders properly at this stage. The initiative has promised benefits in the business case stage (even if just qualitative benefits). The benefits may have been used to persuade leaders to take resources away from other work to be dedicated to remediation.

It is therefore often very important to start to show that the promised benefits have actually been delivered by the remediation. Where this is done well, you will see the following:

  • Leaders encouraging you to continue on to the next process area or data domain
  • Previously reluctant stakeholders asking for their area to be added to the roadmap
  • Increased investment in related data activities – such as analytics – because the level of confidence in data increases
  • Additional areas appointing data stewards/data...

Summary

Remediation has always been one of the most challenging parts of a data quality initiative. It can be incredibly difficult to get sufficient resources to make meaningful changes to data quality scores in a reasonably short period of time. This chapter has outlined how to ensure that the resources allocated are working on what the organization believes are the key priorities and that the approach taken is the most effective possible.

This chapter has outlined how to show stakeholders the progress being made and how to tie progress to the benefits that were agreed upon in the business case chapter.

With successful remediation, you will be asked to continue your data quality initiative into previously unsupportive business areas and deliver even more benefits.

To sustain business benefits, it is critical to make permanent changes to the way the organization manages its data and to embed further data quality improvement into the fabric of its business processes. The next...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Data Quality
Published in: Sep 2023Publisher: PacktISBN-13: 9781804610787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Step Name

Description

Prioritize

Identification of the most important data quality failures so that these can be targeted early.

Identify the approach

There are a number of different ways to remediate data. Here are some examples:

  • Manual record-by-record corrections
  • Collection/upload of data from a third party
  • Automatic...