Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Big Data Analytics Using Azure Synapse Analytics

Traditional analytics done on structured and relational data helps with analyzing transactional data. This worked well until the dotcom revolution, which saw an influx of large volumes of semi-structured data such as shopping carts, customer profiles, and ad clicks. A new type of technology was needed to process big data considering its volume. Due to this, data processing methods such as MapReduce became popular (to learn more about MapReduce, please refer to https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-introduction). This led to technologies such as Hadoop and – later – Apache Spark becoming the new big data processing engines.

In this chapter, we will look at Azure services that can help you build a data mesh landing zone template for big data processing. We will cover one possible architecture for handling and analyzing big data by covering these topics:

  • Requirements
  • Architecture...

Requirements

To understand the requirements of a big data processing architecture, let’s consider an example. Let’s say there’s a situation where a consumer goods company wants to understand its customers’ preferences and behavior to optimize its product placement, inventory management, and targeted marketing. To achieve this, the company will have to collect data from the following sources:

  • Sales transactions: These are transactions that are made either at the physical store or through online website purchases.
  • Online behavior: Tracking which products are frequently viewed and searched as customers browse the company website.
  • Customer feedback: Customers are often offered to provide feedback through surveys, reviews, and feedback forms. This data needs to be collected and processed to improve business performance.
  • Social media interactions: Consumers react to company products and their experiences by adding comments and posts on social...

Architecture

Let’s look at the architecture for implementing the preceding requirements. This architecture is divided into four stages: ingest, storage, processing, and server. It’s depicted in Figure 15.1:

Figure 15.1 – Big data processing using Azure Synapse Analytics

Figure 15.1 – Big data processing using Azure Synapse Analytics

Take a closer look at this architecture; in the next section, we’ll learn about the components that are used and their functionality.

Components

In Figure 15.1, starting from left to right, let’s look at each component and understand their functionality/attributes.

Source data

Source data can be semi-structured data such as web logs in JSON or comma-separated files or structured data from sales, marketing, and inventory databases.

Azure Synapse pipelines

Azure Synapse pipelines function the same as Azure Data Factory, except that they are integrated into Synapse Studio. This allows data engineers and data scientists to share the same workspace for preprocessing and analyzing the data. Azure Synapse pipelines will ingest the semi-structured logs and structured data from company databases into the data lake. They have the same number of connectors as Azure Data Factory to connect to different data sources. For more information on Azure Synapse, please refer to the following links:

Data flow

  1. Data from semi-structured and structured sources is read using Azure Synapse pipelines and written into the bronze layer of the data lake.
  2. Data is then moved between the bronze, silver, and gold layers using more pipelines.
  3. Data from the Medallion storage system is used by Synapse Spark clusters or Synapse SQL pools to further conduct analytics on it.
  4. Analytical data from Synapse is pushed to Cosmos DB and Azure Data Share and read by Power BI to expose the data to various consumers, such as applications, dashboards, and other teams.
  5. The data in Cosmos DB can be searched using Azure AI Search through the mobile app or website.
  6. Power BI surfaces the analytics in the form of dashboards.
  7. Azure Data Share shares the data with external parties that need the data for their processing purposes.

Now, let’s look at some scenarios where this architecture can be applicable.

Scenarios

  • BI and strategy: Analyzing market trends, consumer behavior, and pricing strategy
  • Healthcare: Predict epidemics, personalized treatment plans, and manage healthcare resources
  • Energy and utility: Predictive maintenance and optimizing energy distribution

Many other sectors, such as agriculture, retail, sports, government, and telecommunication can use big data analytics to analyze structured and semi-structured data to optimize their business and operations.

Summary

In this chapter, we looked at a possible architecture for big data analytics. We discussed all the different data dimensions (the four Vs) and how to ingest data coming at different speeds. We also looked at various processing engines to process real-time and batch time series data before we can surface the processed data and analytics to applications and dashboards and share with the other teams. It is important to note that this is just one of the possible architectures. You can build a similar architecture using Azure Databricks or Azure HDInsight. But what we have presented here is a popular architecture that’s typically used by many companies.

In the next chapter, we will look at event-driven analytics using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar