Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Monitoring and Data Observability

Monitoring and data observability are two sides of the same coin and are both critical for a well-managed data mesh.

Monitoring is all about proactively monitoring the health of the landing zones. As the data mesh grows and more data products start leveraging the mesh, it becomes critical to ensure that all the data products and their landing zones are up and running. This requires the proactive and reactive monitoring of the entire data mesh from a central location.

Data observability has more to do with a data mesh inherently being a collaborative framework. Data will be moving between landing zones and it’s very important to observe this movement of data and any changes made to the data.

In this chapter, we will see how to think about, design, and build a monitoring system from the ground up that will make the management of the data mesh easier.

In this chapter, we will cover the following:

  • The importance of data mesh...

Piecing it all together – the importance of data mesh monitoring and data observability

In my experience of working on data mesh projects with multiple customers, I have observed two uber-level patterns on how centralized data analytics gets funded and monetized:

  • The central cost center model: Some companies treat centralized analytics as a requirement for the company and fund it as a central initiative. In such scenarios, an annual budget is allocated to analytics and projects are prioritized by a committee of key stakeholders that decide which analytical projects will be supported and which will be put on the back-burner for the next year. Central analytics is a cost center, and the return on investment (ROI) is evaluated by the overall impact of analytics on the top and bottom lines of the business. The data mesh implementation is also part of this central budget.
  • The distributed cost center model: Many large global companies treat each zone or major product division...

How data mesh monitoring differs

In a centralized analytical system, you have a few fixed subscriptions that manage resources across the end-to-end analytical cycle. The pipelines, storage system, data warehouse, and all the analytical frameworks are common to entire the system. The monitoring system, too, focuses only on this set of resources. Any change to the analytical system’s architecture is systematically added and changes are made to the monitoring system. These changes are few and far between.

A data mesh environment, depending on your data mesh strategy, can be very dynamic. New landing zones and data products can get added to the mesh more frequently than to the central system. Each data product will have a different set of tools and services for ingestion, analytics, and serving. Monitoring all the landing zones means ingesting diagnostics and logs from all the landing zones into the central data management zone and then analyzing all of them. The monitoring data...

Baking diagnostic logging into the landing zone templates

Every Azure service has a very consistent method of collecting diagnostic logs. If you go to the Azure portal and go to the landing page of any Azure service, you will see a pane on the left side of the landing page, also called the Resource Menu. Under the Monitoring section, you will see a menu option called Diagnostic Settings, as shown in Figure 11.3. The working pane of this menu option allows users to set up diagnostics for this service.

There are different categories of diagnostics and logs in Azure. The Azure cloud service is divided into the management plane or the control plane and the data plane. The role of the control plane is to create and manage the resource (create, update, delete). The data plane is used to interact with the resource (read, write, execute). For example, if I am creating a storage account, the creation process is done via the control plane using the Azure Resource Manager to create the storage...

Designing a data mesh operations center

People from the world of networking infrastructure must be familiar with the Network Operations Center (NOC). It is usually a physical room with screens of dashboards like the one depicted in Figure 11.5:

Figure 11.5 – A network operations center

Figure 11.5 – A network operations center

A NOC is a room where the health of the entire network and infrastructure of an organization is visible. These centers typically run 24/7 for organizations that need 24/7 uptime. With networking technologies and monitoring technologies improving, these centers have reduced in size considerably.

A similar operations center is required for the data mesh. Let’s call it a DMOC, though this is not an official industry term.

The data mesh can become a complex network very quickly. The complexity of the data mesh will depend on the strategy and design decisions made at the beginning, before implementing the data mesh. Some large companies have a strategy to...

Tooling for the DMOC

There are multiple ways of building monitoring dashboards in Azure. In this section, we will discuss the most popular combinations. Let us begin by summarizing the list of available tools.

Azure Monitor

Azure Monitor is one of the core services of Azure that collects, monitors, and helps analyze the metrics and logs from the cloud and on-premises services and resources. It’s the core building block for any infrastructure monitoring need. It also helps you set alerts to respond to the metrics exceeding certain levels. Alerts can send messages and emails. They can also trigger runbooks to implement any autoscaling or maintenance jobs to automate escalation to the solution.

Log Analytics

As we saw in Baking diagnostic logging into the landing zone templates, Log Analytics is part of the Azure Monitor service. It is a tool that allows you to run queries against the logs collected by Azure Monitor. It uses Kusto query language (KQL) to query the logs...

Data observability

The health of your network defines the health of your infrastructure. A large virtual machine with great specifications to crunch numbers at blazing speed is rendered useless if the network is not working. Similarly, data pipelines are the backbone of a healthy data mesh. Without timely and accurate data, the best analytical system is not useful to the business.

Data quality, data contracts, and master data management are some of the key tenets of the quality and accuracy of data. Along with quality and accuracy, we also need to monitor the movement of data through data lineage in Microsoft Purview as well as data pipeline health through data mesh monitoring.

Throughout multiple chapters in this book, we have looked at all of these aspects of data. As part of data mesh monitoring, you need to bring all of these different data observability metrics together onto a single pane of glass.

Setting up alerts

Being proactive about monitoring the data mesh is very important. Your stakeholders finding out about issues with data or services before you do is never a good scenario.

In order to ensure that the IT ops team is ahead of infrastructure failures, performance bottlenecks, and data quality issues, you need to ensure that alerts are triggered whenever a diagnostic setting goes beyond a threshold (amber or red). The alert has four parts:

  • Resources: The resources on which the alerts need to be set.
  • Alert rules: The rules that define when an alert should trigger.
  • Alert: The alert itself.
  • Actions: The actions that the alert should take once it triggers. This could be sending an email or mobile SMS or triggering a runbook.

For more details on how to set alerts in Azure, refer to the following link: https://learn.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-overview

Microsoft Purview also has an alerts feature for the data compliance...

Piecing it all together

When a pipeline goes down or a resource is reaching its limit and needs to be configured to scale further, you need to act quickly. In order to do so, you need to know where to look for the diagnostics and investigate the situation. Hence, it’s very important that the monitoring system is centralized to a single pane of glass that everyone can look at. This single pane of glass should be the data mesh portal. All of the dashboards and the alert management system should be brought to the portal with granular access rights defined to have access to different levels. The higher-level dashboard should only be visible to the data mesh ops team. Individual landing zone dashboards should also be accessible to the concerned landing zone team but not to any other landing zone team. These access levels can also be easily set if everything is in one place.

Summary

This concludes the chapter on monitoring and data observability. We looked at all the complexities of monitoring a data mesh and how to integrate monitoring as part of the data mesh by adding it to the landing zone templates. We then looked at the steps for building a DMOC and briefly talked about bringing all the data monitoring aspects covered so far in this book together to build data observability.

In the next chapter, we will look at monitoring the cost of a data mesh and building an effective cross-charging system for scenarios where the central data mesh is not an independent cost center and needs to fund itself by charging for its services.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar