Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Practical Data Quality

You're reading from  Practical Data Quality

Product type Book
Published in Sep 2023
Publisher Packt
ISBN-13 9781804610787
Pages 318 pages
Edition 1st Edition
Languages
Author (1):
Robert Hawker Robert Hawker
Profile icon Robert Hawker

Table of Contents (16) Chapters

Preface 1. Part 1 – Getting Started
2. Chapter 1: The Impact of Data Quality on Organizations 3. Chapter 2: The Principles of Data Quality 4. Chapter 3: The Business Case for Data Quality 5. Chapter 4: Getting Started with a Data Quality Initiative 6. Part 2 – Understanding and Monitoring the Data That Matters
7. Chapter 5: Data Discovery 8. Chapter 6: Data Quality Rules 9. Chapter 7: Monitoring Data Against Rules 10. Part 3 – Improving Data Quality for the Long Term
11. Chapter 8: Data Quality Remediation 12. Chapter 9: Embedding Data Quality in Organizations 13. Chapter 10: Best Practices and Common Mistakes 14. Index 15. Other Books You May Enjoy

The value of this book

I realize it is never easy to find the time to read a book like this one. There are so many business books you could read to improve your performance and that of your organization. Most people have started to read a number of similar business books and never made it all the way through.

So, why invest your valuable time in this one? I hope that I will help you understand which of your data is bad, which of that data matters, how to get that data quickly from bad to good enough – and to keep it there. This is the meaning of Practical Data Quality.

The approach outlined in this book helped take an organization that had such poor data that it was literally struggling to keep the lights on in its premises, to a point where data quality was considered a strength. (This organization had such poor data that it could not get payments to its utility providers and very nearly had a power supply suspension.)

The rate of progression was high. In just weeks, data quality improvements were made for the highest priority issues. Within 6 months, an automated data quality tool was in place to identify data that did not meet business needs, and processes were in place to correct the data. After two years, data quality was fully embedded in organizational processes, with new employees given training on the topic and data quality scores close to 100% of the targets. If you follow the approach in this book and you have the right support from your organization, you should be able to achieve similar results.

Importance of executive support

I firmly believe that the approach in this book is the right one. However, even the right approach can fail without the right support from executives.

In the example organization, the support that was required was relatively easy to obtain. The situation had been so bad that the leadership team could see that data quality was a major issue that was affecting revenue, costs, and compliance matters – the three topic areas that typically capture the interest of executive boards.

The data quality team was asked to report on data quality monthly to the board and every time a concern or blocker was raised, actions were immediately defined to move them out of the way.

In most organizations, data quality issues are not so severe that their impact is plain to see right up to the executive level. The issues are well known to those on the front line of the business, but people work hard to smooth the rough edges of the data before it reaches the executives. Processes and compliance activities are impacted, but not severely enough to cause a complete breakdown that executives will become aware of. Business and IT executives often have different priorities and different languages when talking about data and data teams must often bridge these divides.

The following chapters will outline an approach that will help you surface these issues in a way that will influence executives to support data quality initiatives.

The remainder of this chapter will cover the following main topics:

  • What is bad data?
  • The impacts of bad data quality
  • The typical causes of bad data quality

What is bad data?

The first topic is about defining what is meant by bad data. It rarely makes sense to aim for what people might consider perfect data (every record is complete, accurate, and up to date). The investment required is usually prohibitive, and the gains made for the last 1% of data quality improvement effort become far too marginal.

Detailed definition of bad data

What do I mean by bad data?

In summary, this is the point where the data no longer supports the objectives of the business. To drill into this in more detail, it is where the following occurs:

  • Data issues prevent business processes from being completed in the following ways:
    • On-time (for example, within service-level agreements (SLAs))
    • Within budget (for example, the headcount budget has to be exceeded to keep up with agreed time constraints)
    • With appropriate outcomes (for example, products delivered on time)
  • Data issues mean key information is not available to support business decisions at the time it is required. This can be because of the following challenges:
    • Missing or delayed information (for example, selecting products to discontinue based on profit margins, but no margins are available for key products in reporting)
    • Incorrect information (for example, competitor margin is presumed to be X% but is 5% lower than this presumption in reality, due to an error in data aggregation)
  • Data issues cause a compliance risk – this can be where the following occurs:
    • Data that must be provided to a regulator is not available, is incomplete, incorrect, or is delayed beyond a regulatory deadline
    • Data is not retained as per privacy laws – such as the General Data Protection Regulation (GDPR) in the EU
  • Data does not allow the business to differentiate itself from its competitors where data is sold as a product (for example, a database of customer data) or as part of a differentiated customer experience.

Data that contributes to any of these types of issues to the point that business objectives cannot be met would be considered bad by this definition.

The level of data quality is rarely consistent across business units and locations within a company. There are usually pockets of excellence and areas where data has become a major problem. Often, the overall progress of a business toward its objectives can be seriously impacted by significant failures within just one business unit or location.

One organization I worked with had a strongly differentiated product that was achieved through great R&D and thoughtful acquisition activity. The R&D team carefully managed their data and kept the quality high enough to achieve their business objectives. The Operations team was less mature in their management of data, but their data quality issues were not severe enough to prevent them from meeting their main objectives. They still managed to produce enough of their differentiated product for the organization to predict extremely high sales growth. However, the Commercial team had inherited low-quality customer master data (heavily duplicated, incorrect, or missing shipping details primarily) from an acquisition, and some of the possible sales growth was not achieved. As part of a customer experience review, a major customer commented, “you can have the best products in the marketplace, but if it becomes hard to do business with you, it doesn’t matter.”

Bad data versus perfect data

We already mentioned that the investment to get to perfect data rarely makes economic sense. Having bad data does not make economic sense either. So, how should organizations decide on what standard of data is fit for purpose?

The answer is complex and will be covered in more depth in Chapter 6, in the Key features of data quality rules section, but in summary, you must define a threshold at which you deem the data to be fit for purpose. This is the point where the data allows you to achieve your business objectives.

The trick is to make sure that the thresholds you define are highly specific. For example, most people would consider a tax ID for a supplier to be a mandatory element of data. It is tempting to target a data quality score of 100% (in other words, every row of data is perfect) for data like this, but in reality, thinking must be much more nuanced.

In many countries, small organizations will not have a tax ID. In the UK, for example, it is optional to register for VAT until company revenue reaches £85,000 (as of 2022). This means that the field in a system that contains this data cannot be made mandatory when collecting the data. A data quality threshold has to be set at which data will be considered fit for purpose.

Note

To manage this truly effectively, you would segregate the vendors into large enterprises and smaller organizations. You would set a high threshold (for example, 95%) for large enterprises, and a much lower threshold (for example, 60%) for smaller organizations.

To get this rule perfect, you might even try to capture (or import from a source such as the Dun and Bradstreet database) the average annual revenue for the past 3 years for a supplier when adding them to your system. You would then specify a high threshold for those who had revenue over the tax registration level. This would be a time-consuming rule to create and manage because you would need to capture a lot of additional data and the thresholds would change over time. This is where judgment comes in when defining data quality rules – is the benefit you will gain on making the rule specific worth the effort to obtain/maintain the information you need?

If you are not specific enough with your targets, data may be flagged as bad inappropriately. When tasked with correcting it, your colleagues will notice false negatives and lose faith in the data quality reporting you are providing them with. In this example, a supplier is being chased for a tax ID, only to find they do not have one. These false negatives are damaging because the people involved in your data quality initiative start to feel they can ignore the data failures – it is the classic “boy who cried wolf” tale in a data quality context.

Now that we have introduced the basics of bad data, let's understand how this bad data can impact an organization.

You have been reading a chapter from
Practical Data Quality
Published in: Sep 2023 Publisher: Packt ISBN-13: 9781804610787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}