Reader small image

You're reading from  Practical Data Quality

Product typeBook
Published inSep 2023
PublisherPackt
ISBN-139781804610787
Edition1st Edition
Right arrow
Author (1)
Robert Hawker
Robert Hawker
author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker

Right arrow

Data Discovery

Regularly in my data quality career, customers and stakeholders have told me that they know their data "inside out". However, from my experience, the application of data profiling will surprise even these stakeholders. For example, at one organization, the procure to pay process owner assured me that no suppliers were on “pay immediately” terms (meaning that invoices would be paid as soon as they were issued). Data profiling revealed that in fact, 40 suppliers were set to these terms, with a total spend of several million dollars being paid immediately instead of accruing interest for the organization.

Data profiling helps to identify the data quality rules that organizations would like their data to comply with by pointing out the “extremities” of the data. Often, these extremities are examples of something that has gone wrong with the data and needs to be corrected.

To detect these extremities, a tool typically evaluates...

An overview of the data discovery process

Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined.

This starts with understanding the strategy of your organization, the objectives of key stakeholders, and crucially, what is getting in the way of fulfilling these. It is important to ask stakeholders to talk about this holistically and not to filter their answers, based on what they think might be data quality related. It is very common for issues to appear to have little to do with data, but when an investigation takes place, a link to data quality is uncovered. Clearly, not every problem will have a data quality root cause, but it is important to have the chance to form your own expert opinion.

Once the strategy and objectives are well understood, it is time...

Understanding business strategy, objectives, and challenges

The biggest mistake that can be made in a data quality initiative is focusing on the wrong data. If you fix data that does not impact a critical business process or drive important decisions, your initiative simply will not make the difference that you want it to. It could lead to the end of your work before it has had the chance to mature. Senior stakeholders have a lot of proposals competing for budget, and it is common for initiatives that do not make the right impact to lose their funding.

Focusing on the wrong data often happens when the person instigating the data quality initiative or sponsoring it has a particular background. One organization I worked with had a new data quality manager with a purchasing background. They came from a large organization with a manufacturing element, where the efficient purchasing of raw materials was make or break in terms of margin. Suppliers were managed really effectively, and...

The hierarchy of strategy, objectives, processes, analytics, and data

If you have followed the process outlined in the Understanding strategy, objectives, and challenges section, you should have at this point a list of data quality-related challenges, as well as an idea of the systems and data involved in the challenge. It is likely at this stage that you identified more potential challenges than you can prioritize at this time.

The next step in the process is to review these holistically and select where to put your focus.

Prioritizing using strategy

Having gathered data quality challenges that impact various pillars of a strategy, it is time to take stock. This may involve going back to the strategy team to present your findings.

The following diagram illustrates the typical outcomes of the early discovery meetings.

Figure 5.2 – Strategic pillars by the number of rules and complexity

Figure 5.2 – Strategic pillars by the number of rules and complexity

There is often a particular strategic pillar where...

Basics of data profiling

Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column. For example, for both values and string lengths, the minimum, maximum, mean, and median are provided to help identify outliers.

Most of you will have some experience in data profiling – even if you have not heard the term before. The first task that many people perform when looking at an unfamiliar set of data is to open it in a spreadsheet tool and apply a filter (the autofilter feature in Microsoft Excel, for example) to all the columns. They will check all values in each column, looking to see whether the column contains a couple of values that all the rows are associated with, or whether there are many. People look to see whether the data is a number, a date, text, and so on. It’s quite common to look for the smallest and largest values. Even this basic action is an...

Summary

The early part of this chapter outlined how to properly understand the business strategy of a business. If this element goes well and has the right support, an organization will feel that the data quality initiative truly understands the priorities of the business. This breeds confidence that the work done on data quality will be focused on the right aspects.

The chapter also outlined how to use the information from this discovery phase to properly research the root cause of challenges that impact the strategy. It also outlined how to link these challenges to processes, analytics, and data.

All of this has informed which data to profile. The chapter covered the main outputs that profiling provides and the potential data quality rules it can generate. This is likely to have revealed some surprises about data, even to those who use it every day. The maturity of the data conversation has now reached the point where data quality rules can be fully developed.

Following...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Data Quality
Published in: Sep 2023Publisher: PacktISBN-13: 9781804610787
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Robert Hawker

Robert Hawker started his career as a chartered accountant before making the leap into data in 2007. He led data teams within two global implementations of SAP, looking after master data management, data ownership and stewardship, metadata management, and, of course, data quality over a 14-year period. He moved into analytics in 2017 and now specializes in Microsoft Power BI training, implementation, administration, and governance work. He lives in the UK and shares his experiences through conference and blogs.
Read more about Robert Hawker