Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Building a Self-Service Portal for Common Data Mesh Operations

In Chapter 6, we briefly looked at a self-service portal for requesting landing zones. It highlighted the need for automating the deployment and maintenance of landing zones. However, a central data mesh portal can be more than just requesting for landing zones. In fact, in some situations, a landing zone request feature may not even be required. The data mesh portal can serve many other purposes, hosting pipeline templates, data catalog search, requesting access, and monitoring dashboards. It can become the face of your data mesh and create a pull for teams to collaborate on.

In this chapter, we will learn how to build a data mesh portal, as well as all the typical functionality it must implement, and the components and the architecture required for building it.

In this chapter, we will cover the following topics:

  • Why do we need a self-service portal?
  • Gathering requirements for the self-service portal...

Why do we need a self-service portal?

Having a self-service portal to manage a data mesh is not mandatory. That’s right! You don’t have to have a self-service portal. There can be multiple reasons for not having one, such as the following:

  • Small and medium-sized enterprises that can easily define their landing zones upfront and won’t need additional landing zones or new project resource groups don’t need a self-service portal
  • Small companies that can simply create a landing zone based on an email sent by one of their departments don’t need a self-service portal
  • Even large or medium-sized companies that have a well-defined business and don’t need the agility of creating landing zones or pipelines on the fly don’t need a self-service portal

However, having one will definitely have benefits irrespective of whether you need one or not. So, if you have the resources to build one, you should.

A portal for your data...

Gathering requirements for the self-service portal

A data mesh portal should be treated like a product. It’s the entry into the mesh. The first part of building any product is gathering requirements. This is the most critical part of product development.

You need a lot of brainstorming around the requirement-gathering activity. Depending on the size of your company and the complexity of your data platform, this might mean a few day-long meetings with all the stakeholders or weeks of meetings and interviews with stakeholders, data scientists, engineers, and business analysts.

Here are a few pieces of criteria that will drive the development of the portal:

  • The size of your company
  • The complexity of the data estate
  • The nature of the data and analytics projects
  • The size of the data mesh operations team
  • The available full-stack resources to build the portal

Now, let us look at some of the most common requirements that are a bit obvious and also...

Requesting landing zones or data products

As discussed in the previous section, this functionality may not be included as part of the portal depending on the data mesh design that you have adopted and the needs of the company. If new projects are rare and experimentation is also not a requirement for your company, then most of the landing zone creation work can be done upfront using pipelines and ARM templates. But if your industry is competitive with a constant need for innovating analytics to keep yourself ahead of the curve, then there will be a need for creating landing zones on the fly. In this case, requesting landing zones needs to be an important feature on the portal.

In order to build a frictionless functionality to request landing zones, it’s critical that your backend infrastructure and automation to create landing zones are in order. Assuming that a new data product request is a new landing zone, the following needs to be ready before you can start building this...

Data catalog

Ideally, if you have chosen Microsoft Purview as your data catalog tool, then the search portal of Microsoft Purview should be your data catalog, in which case you should simply point to the Microsoft Purview Catalog Search landing page from the portal.

However, sometimes companies want to abstract the data catalog to a higher level. This is primarily done for two reasons:

  • Microsoft Purview Catalog Search is very advanced and some users might want a simpler search tool
  • Microsoft Purview may not be the only catalog tool being used

To build a catalog of your own that consolidates other catalogs, you need to do the following:

  • Extract metadata from all your other third-party catalogs
  • Design a metadata schema that encapsulates information from other catalogs
  • Select the right storage for your custom metadata
  • Select the right search technology to index and search the metadata
  • Design a UI that meets the requirements of the data mesh...

Hosting common data pipeline templates

After exploring the data mesh and finding the right data for their project, the next step for data product teams is to access that data directly or move that data to their data product landing zone. Small or medium-sized data kept in databases or data lakes can sometimes be directly accessed into a Python workbook by using a connection string and reading the data into a data frame. But for large datasets and data coming from on-premise legacy systems or enterprise resource planning (ERP) and customer relationship management (CRM) systems hosted outside the data mesh, you need pipelines.

In Azure, these pipelines are typically built using Azure Data Factory. While sources for these pipelines are common across data products, the type of storage where this data is deposited is also quite standard. It’s either a data lake or an SQL database that is typically used to store this data. If each data product team starts building pipelines to...

Other common features of a self-service portal

We discussed some of the most common functionalities that companies build on their data mesh portal. Depending on the requirements of the users and how the company is structured, additional features can be added to the portal. Here are some examples of these features:

  • Data mesh explorer: A UI that allows users to explore the data mesh. It allows them to see how the mesh is structured, which landing zones are connected to each other and which are not, and what projects run under each landing zone.
  • Data mesh health dashboard: A high-level color-coded dashboard showing the health of all the data landing zones. This is a dashboard used by users to immediately check the health of their landing zones when something is not working and before they raise a support request with the operations team.

    There may be other features you might want to add depending on the feedback from the users.

  • Data contracts management: As we will see...

Architecting the self-service portal

Architecting the data mesh portal is like architecting any web portal. You need to treat your employees as customers of the portal and build a responsive, user-friendly portal that will become an important tool for making their work easier.

From a high-level architecture perspective, a simple N-tier architecture will be sufficient to take care of the scale required for the data mesh portal.

Let’s walk through some of the core components of the N-tier architecture and possible options to build on the Azure cloud.

Active Directory and Domain Name System (DNS)

As a standard practice, all access to the portal needs to be integrated with Active Directory and a DNS to resolve the URL to a specific app gateway IP address. You also need to decide on the URL for the service. Depending on how the company intranet is designed, you can have the data mesh as a subdomain (https://learn.microsoft.com/en-us/azure/dns/delegate-subdomain), for...

Summary

This concludes our chapter on building a data mesh portal. We looked at the requirements of a portal and how to gather the requirements, the various important and optional features of a data mesh portal, and finally, how to architect the portal.

This also is the end of Part 1 of the book. In this part, we discussed all aspects of designing and deploying a data mesh. In Part 2, we will discuss some practical challenges faced while deploying a data mesh and how to overcome or avoid them.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar