Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Deploying a Data Mesh Using the Azure Cloud-Scale Analytics Framework

In the previous chapter, we discussed how to build a data mesh strategy. Once a strategy is ready, it’s time to implement it. The Microsoft Azure team has created a template for deploying a data mesh while using standardized methodology and best practices used by almost all of Microsoft’s customers to create and manage their cloud infrastructure. It is called the Cloud Adoption Framework (CAF). Based on the CAF, they have created the Cloud-Scale Analytics (CSA) framework, which provides the guidance and the required templates to roll out a data mesh in the Azure cloud.

In this chapter, we will dig deeper into the CAF and the CSA framework to understand how these frameworks can be mapped to a data mesh architecture. These frameworks are just guidelines (not written in stone) and you can choose to build your own methods to build a data mesh, but these frameworks can make the journey easier and also...

Introduction to Azure CSA

As big and small enterprises started moving their on-premises infrastructure to the cloud, Microsoft realized that there needed to be some standardization to how the cloud infrastructure should be set up on the cloud. To streamline the process of migrating workloads to the cloud or even building new greenfield infrastructure on the cloud, Microsoft created the CAF. The CAF provides the best practices, guidelines, documentation, and tools to help cloud technology experts quickly adopt the cloud. It helps you set up your basic foundational infrastructure in the cloud so that you can land your actual workload resources smoothly and ensure that they follow all the required security, network, and other best practices.

An in-depth coverage of the CAF is beyond the scope of this book. You can find more details about the CAF in the Microsoft documentation at https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/.

Yet another framework created by Microsoft...

Understanding landing zones

The best way to describe the concept of a landing zone is with a real-life example. Let’s assume that you have a product idea and want to build a business around it. Your product needs a manufacturing plant, a warehouse, and an office. You decide on a city to start your business and narrow down on a piece of land to set up your plant, warehouse, and office. Now, all you need to do is complete the purchase formalities and start building your business. You don’t have to worry about electricity and water supply to your business. The city police and the legal system ensure that your business is safe. So, you can just run your business without having to worry about any of these basic requirements.

Similarly, an Azure landing zone provides the basic infrastructure for your analytical product. It ensures that all the networking, security, policies, and other basic infrastructure are set up based on best practice guidance provided by Microsoft so...

Organizing resources

Microsoft Azure provides management levels to manage your cloud resources. These logical structures allow you to organize your resources in a hierarchy that matches your business organization and further provides an inheritance of the policies that you apply to them. Figure 3.2 shows the hierarchy of management groups, subscriptions, resource groups, and, finally, the resources:

Figure 3.2 – Azure management levels

Figure 3.2 – Azure management levels

Let’s understand the purpose of each of these entities in the hierarchy:

  • Management groups: These are the highest level of management under the Azure tenant. They help you manage the access, policy, and compliance for all the subscriptions grouped under one management group.
  • Subscription: A subscription groups a set of resources and associates a set of users to them. Subscriptions also have some quotas and limits to prevent over-use and manage costs.
  • Resource groups: Resource groups are logical...

Designing a cloud management structure

As mentioned in the previous section, the Azure management structure should align with your business structure of how your divisions are internally organized and how you wish to percolate the policies down to each of these divisions. For example, almost all companies have a central information technology (IT) team that manages the global infrastructure of the company. This includes networks, firewalls, and identity. Most companies also have customer relationship management and enterprise resources planning systems such as SAP that need teams and management of their own. Other corporate systems are managed by different business teams.

Each of these divisions or teams needs different policies that typically do not overlap. For example, you won’t want an employee from the sales division to manage the networks. Employees from the IT team have no reason to query the sales data. The life cycles of these infrastructure pieces and the corporate...

Diving deeper into landing zones in CSA

CSA translates the data mesh concept to Microsoft Azure constructs. You must follow the design principles of these landing zones to ensure that there is no mismatch between what you wish to achieve through the data mesh and actual implementation.

CSA proposes two types of landing zones:

  • Data management landing zone
  • Data landing zone

The data management landing zone is central to CSA and the data mesh architecture. It is responsible for the governance of the analytical platform. The data landing zone hosts one or multiple data products. All the data landing zones are connected to the data management landing zone. It contains all the Azure services required to implement these data products. Each data landing zone could also be optionally connected to other data landing zones if they are sharing or receiving data from other data landing zones.

Depending on the complexity of your analytical requirements and how analytical...

Automating landing zone deployment

Decentralizing your data and analytics will mean multiple data landing zones. In large enterprises and global companies, this can mean many landing zones. Multiple groups and organizations will request a separate landing zone for their analytics projects. So, while data and analytics are being democratized, management of these disparate landing zones is becoming complex and challenging. Hence, it’s critical to the success of a data mesh to automate the creation and management of these landing zones as much as possible.

This need for automating the deployment and management of infrastructure is common across any large cloud setup. Hence, Azure has many tools to help customers automate their cloud deployments. The collective term for this set of technologies is called Infrastructure as Code (IaC).

IaC

The easiest and fastest way to create and manage infrastructure on the Azure cloud is through the Azure portal. It’s a quick, user...

Organizing resources in a landing zone

A landing zone is typically a subscription that needs to be governed by a common policy. However, creating one landing zone for every project in the company could lead to too many landing zones to manage. Hence, companies typically choose to make landing zones based on a business domain such as sales, marketing, finance, or HR or based on business zones such as North America, South America, Europe, and Asia. Then, within each landing zone, they allow multiple teams to create and manage their analytical projects.

Hence, inside a landing zone, the company needs to manage the resources efficiently in a way that each team is given access to only their resources. There might be some shared resources that might need separate management and common access to all.

This project level and shared resources management inside a subscription can be achieved through resource groups, as shown in Figure 3.10:

Figure 3.10 – Data landing zone resource groups

Figure 3.10 –...

Networking topologies

In a decentralized data environment, data can only be shared by linking the decentralized zones together. These connections are made using Azure Virtual Network peering (https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview). The advantage of using Azure Virtual Network peering is that the latency of exchanging data between networked peers is the same as the latency of resources exchanging data within the private network. So, users from two different landing zones that are peered over an Azure Virtual Network see no difference in the speed of data transfer, so long as the two networks are in the same region. The Azure Virtual Network uses the high bandwidth Azure backbone network to connect the two peered zones.

The traffic between these peered networks is managed using forwarding rules that are configured when peering is enabled. These rules define how traffic is allowed to move between the two peered networks. For details...

Security and access control

Once the various landing zones are all connected in a mesh, the next most critical element of a data mesh architecture is managing access and security.

There are two main layers to security and access control:

  • Authentication
  • Authorization

Authentication verifies the identity of a user (username and password) to provide or deny access to a given Azure service or resource. Authorization determines what the authenticated user can do with the resource, as well as the operations they are allowed to perform.

To implement standardized authentication and authorization across your data mesh, we need an identity provider. While multiple solutions or tools might be available to implement identity management, the best practice is to employ a single identity provider. The most popular authentication provider on Azure is Azure Active Directory. In August 2023, Microsoft renamed the Active Directory service to Microsoft Entra ID (https://learn...

Streamlining deployment through DevOps

Creating individual landing zones, networking them together, applying policies and security, and repeating this process for every landing zone can be a tedious and time-consuming process. Once the data mesh has been deployed, maintaining it by applying changes and fixes and tracking those changes can add another level of complexity.

DevOps is a set of tools and processes that help automate IT operations. If implemented early, they can help streamline the deployment and maintenance of a data mesh.

Let’s define some key components of DevOps and understand how they are used to automate the data mesh deployment and maintenance:

  • Repo: A repo (or repository) is a version control system that stores and tracks the artifacts of a process. These include ARM templates, JSON files, and code scripts. Azure supports two types of repos – Git repos and Azure DevOps repos (Team Foundation Version Control (TFVC)). Git is the default...

Summary

In this chapter, we learned about the various components of the Azure cloud that map to data mesh concepts, broadly called landing zones. We learned about their functionality and how to organize them. Finally, we automated the deployment of these landing zones to ensure that managing a data mesh becomes more manageable through automation and reuse.

In the next chapter, we will learn how to govern and manage a data mesh using Microsoft services in combination with building a governance framework.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar