Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Understanding Data-Sharing Topologies in a Data Mesh

In my experience of helping Microsoft customers implement a data mesh, the part that concerns the customers the most is data sharing. The concept of in-place sharing is completely new to most customers, and they wonder how it will actually work. In-place sharing is a way of sharing data with others without moving the data from its source system. Well, it’s important to understand that while in-place data sharing is key to avoiding data movement through the mesh, it’s not mandatory. So, when do we use in-place sharing, and when do we move data? Are there any other ways of sharing data? And when should we use which sharing technique?

This chapter will answer all these questions. It will lay out clear topologies of data-sharing techniques and scenarios that need each of these sharing topologies.

In this chapter, we will cover the following topics:

  • What is in-place sharing?
  • Understanding data-sharing challenges...

What is in-place sharing?

In-place data sharing is a method of sharing data where the data remains in its original location. It is not physically copied or moved to a different location when another data product wants to access it. Instead, access is granted to the data where it resides. This approach contrasts with traditional data-sharing methods such as data copy pipelines, which involve copying the data to a new location.

The way the data is accessed depends on the way the data is stored. If the data is stored in a data lake, you can provide a connection string to the data lake that the consuming data product can use to directly read the data and load it into its processing space (a Python DataFrame or SQL query result). Security can be managed using temporary keys or authenticated service principals.

Azure Databricks has a feature called Delta Sharing. It’s an open protocol and is supported by its Unity Catalog service. The shared metadata is stored in Unity Catalog...

Understanding data-sharing challenges in a data mesh

A data mesh is an architecture that proposes decentralizing data ownership and centralizes governance of these decentralized data products or landing zones. While it provides agility and independence to the products and the product teams, it does raise multiple new challenges. In many previous chapters, we have discussed these challenges: deploying and managing multiple landing zones, managing data access across different data products, discovering decentralized data, and many other such challenges. We found solutions to make them easier and more manageable.

When we propose a data mesh architecture to companies, the very first thought that comes to their mind is, how will data be shared across this mesh? Will each data product pull data from across the mesh? Will that not create duplicate copies of data across the mesh? What about security across different access points (APs)?

To answer these questions and many more, let us...

Exploring different methods available for sharing data

One of the challenges most companies face is deciding on a data-sharing best practice that standardizes one standard technique to share data. However, while discussing the challenges of data sharing in a data mesh, we realized that there are multiple ways of sharing data. Let us spend some time organizing all these data-sharing techniques and the different components and layers involved in their implementation. This will help in aligning the different methods to different data-sharing scenarios.

In-place access

As discussed in the What is in-place sharing? section, in-place sharing provides direct access to the data from the code that needs to process that data.

As an example, let’s assume a file called data.csv is stored in a data lake of a European finance data product in a company, as shown in Figure 13.5.

Figure 13.5 – data.csv file location

Figure 13.5 – data.csv file location

An executive dashboard team in...

Picking the right data-sharing topologies

To understand the scenarios under which each of these topologies should be used, let us look at the pros and cons of each data-sharing method.

In-place sharing

Here are the pros and cons of in-place sharing:

Pros

Cons

Real-time data-sharing

Prone to network latency and heavy payload-related performance issues

No data duplication

Direct access to source data needs higher security monitoring and complex policies

Table 13.2 – Pros and cons of in-place sharing

The following are ideal scenarios for using in-place sharing:

  • Real-time access that needs the most recent version of the data
  • Small or medium-sized reference data (exploratory data analysis or EDA; real-time APIs)
  • The source and target are in the same or neighboring...

Summary

We started with understanding the challenges of data sharing in a data mesh and what in-place sharing is, defined by the data mesh architecture as the best way to share data to reduce data movement. There are many ways of sharing data across the data mesh and beyond the data mesh. We saw four of the most popular topologies for this: in-place, data pipelines, data APIs, and data sharing. We looked at the pros and cons of each along with their ideal application. One important takeaway from this chapter is that there is no one preferred way to share data. You need to understand the pros and cons of each method and then form a best practice across the data mesh for data product teams to pick the right method for their requirements.

This ends the important topics of designing and implementing a data mesh. The next four chapters will cover some common data analytics workloads and the required architecture to implement these analytical solutions on Microsoft Azure. The first scenario...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar