Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Master Data Management

Every company, big or small, is made up of some common entities that are referenced across all departments. Some of the most common examples of this are customers, products, suppliers, and addresses. This data should be fixed but, unfortunately, it changes over time as it travels through the different departments and regional offices of a company.

This data is called master data.

As a data mesh grows and you incorporate more data products into the mesh, different versions of the master data start flowing into the mesh. This can start impacting the analytical output. Inaccuracies, errors, and redundancies will start showing up in reports leading to hours of manual debugging and lost time and money for the company.

In this chapter, we will learn about master data management (MDM) and its importance to the data mesh.

In this chapter, we will cover the following:

  • Single source of truth
  • What causes discrepancies in master data?
  • MDM design...

Single source of truth

What is master data? The data of the core non-transactional entities of the business that are used across an enterprise or a company are called master data. They provide context to transactions, such as sales and purchase orders. In order to complete a sales transaction or a purchase order, we need information about entities, such as customers, products, suppliers, and address locations. These entities are the master data. In a report that shows sales by customer, sales are the transaction data, and customer is the master data.

Master data entities will vary depending on the vertical market of the enterprise. For healthcare, it will be patients, providers, medicines, and treatments. For a bank, it’s the account numbers, SWIFT codes, and know-your-customer (kyc)) data.

As you can imagine, master data is critical, as enterprises are run around using this data. All the analytics will have the master data as dimensions to the facts. Sales reports by...

What causes discrepancies in master data?

Every company starts with a simple database to manage its customers, products, and all other important business entities. As the company grows, these data start migrating to the different departments and regions of the company. Over time, copies of these data exist in many areas of the company. The data are changed or modified over time. These changes are sometimes planned and sometimes accidental. As a result, the data starts drifting from its original value, and you have different versions of the same data at different locations. Here are a few reasons why this master data drifts from its original values:

  • Inconsistent naming conventions: Some countries capture the names of people, such as their last name, first name, and middle name. Some countries skip middle names. These inconsistent naming conventions can lead to duplicate records for the same product.
  • Lack of standards: Information such as product names and account numbers...

MDM design patterns

MDM has been a requirement for many decades now. Enterprise systems such as enterprise resource planning (ERP), customer relationship management (CRM), supply chain management (SCM), human resources management (HRM), and others have been around for many decades now. Invariably, these systems are set up by different software vendors, and some of them can be homegrown. Managing the master data between these systems has been a requirement for a long time now. As a result, a few industry patterns have emerged on how master data are managed. Multiple articles and papers will discuss different patterns; here, I am summing it all up into three high-level methods of managing your master data:

  • Registry-based system: This is the traditional method where the actual master data are maintained inside the individual systems, but the data are registered with a reference in a central database. The records across the systems are matched, and duplicates are recognized and...

MDM architecture for a data mesh

Given the distributed nature of a data mesh, each data product needs to ensure that the master data they are using are consistent and accurate. This mandates a need for a reference master dataset that is centrally maintained and referenced by all the data products and their pipelines that need to ensure consistency. Multiple architectures are available for managing this central reference dataset. To understand what design works best for you, you should first examine your master data. Not all master data is used by every domain in the company. Customer master data might not overlap product master data, but the sales domain might use both.

Hence, two strategies emerge for managing master data: domain-oriented MDM and domain-level MDM:

Figure 10.2 – Domain-oriented MDM

Figure 10.2 – Domain-oriented MDM

The domain-oriented technique has a single MDM domain that is referenced by all the data products across all the data landing zones, as shown in Figure...

Build versus buy

The build versus buy discussion for MDM is the same as the one for data quality discussion. It all boils down to cost versus customization. Building your own MDM can be expensive, but it will be flexible to fit your exact needs. Buying a system might be cheaper, but you will need to adjust to its limitations.

Summary

In this chapter, we looked at what master data is and the importance of clean master data for generating accurate analytics. We looked at some of the main reasons why master data drifts from its original and correct values. We then looked at various patterns for maintaining master data and how master data can be accessed in a data mesh scenario with distributed analytics. MDM systems can be complex to build and maintain. We saw the pros and cons of building versus buying an MDM system.

In the next chapter, we will understand monitoring and data observability.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar