You're reading from Engineering Data Mesh in Azure Cloud

Product typeBook

Published inMar 2024

PublisherPackt

ISBN-139781805120780

Edition1st Edition

Concepts

Data Science

Author (1)

Aniruddha Deswandikar

Big Data Analytics Using Azure Synapse Analytics

Traditional analytics done on structured and relational data helps with analyzing transactional data. This worked well until the dotcom revolution, which saw an influx of large volumes of semi-structured data such as shopping carts, customer profiles, and ad clicks. A new type of technology was needed to process big data considering its volume. Due to this, data processing methods such as MapReduce became popular (to learn more about MapReduce, please refer to https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-introduction). This led to technologies such as Hadoop and – later – Apache Spark becoming the new big data processing engines.

In this chapter, we will look at Azure services that can help you build a data mesh landing zone template for big data processing. We will cover one possible architecture for handling and analyzing big data by covering these topics:

Requirements
Architecture...

Requirements

To understand the requirements of a big data processing architecture, let’s consider an example. Let’s say there’s a situation where a consumer goods company wants to understand its customers’ preferences and behavior to optimize its product placement, inventory management, and targeted marketing. To achieve this, the company will have to collect data from the following sources:

Sales transactions: These are transactions that are made either at the physical store or through online website purchases.
Online behavior: Tracking which products are frequently viewed and searched as customers browse the company website.
Customer feedback: Customers are often offered to provide feedback through surveys, reviews, and feedback forms. This data needs to be collected and processed to improve business performance.
Social media interactions: Consumers react to company products and their experiences by adding comments and posts on social...

Architecture

Let’s look at the architecture for implementing the preceding requirements. This architecture is divided into four stages: ingest, storage, processing, and server. It’s depicted in Figure 15.1:

Figure 15.1 – Big data processing using Azure Synapse Analytics

Take a closer look at this architecture; in the next section, we’ll learn about the components that are used and their functionality.

Components

In Figure 15.1, starting from left to right, let’s look at each component and understand their functionality/attributes.

Source data

Source data can be semi-structured data such as web logs in JSON or comma-separated files or structured data from sales, marketing, and inventory databases.

Azure Synapse pipelines

Azure Synapse pipelines function the same as Azure Data Factory, except that they are integrated into Synapse Studio. This allows data engineers and data scientists to share the same workspace for preprocessing and analyzing the data. Azure Synapse pipelines will ingest the semi-structured logs and structured data from company databases into the data lake. They have the same number of connectors as Azure Data Factory to connect to different data sources. For more information on Azure Synapse, please refer to the following links:

Azure Synapse documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started-pipelines

Data flow

Data from semi-structured and structured sources is read using Azure Synapse pipelines and written into the bronze layer of the data lake.
Data is then moved between the bronze, silver, and gold layers using more pipelines.
Data from the Medallion storage system is used by Synapse Spark clusters or Synapse SQL pools to further conduct analytics on it.
Analytical data from Synapse is pushed to Cosmos DB and Azure Data Share and read by Power BI to expose the data to various consumers, such as applications, dashboards, and other teams.
The data in Cosmos DB can be searched using Azure AI Search through the mobile app or website.
Power BI surfaces the analytics in the form of dashboards.
Azure Data Share shares the data with external parties that need the data for their processing purposes.

Now, let’s look at some scenarios where this architecture can be applicable.

Scenarios

BI and strategy: Analyzing market trends, consumer behavior, and pricing strategy
Healthcare: Predict epidemics, personalized treatment plans, and manage healthcare resources
Energy and utility: Predictive maintenance and optimizing energy distribution

Many other sectors, such as agriculture, retail, sports, government, and telecommunication can use big data analytics to analyze structured and semi-structured data to optimize their business and operations.

Summary

In this chapter, we looked at a possible architecture for big data analytics. We discussed all the different data dimensions (the four Vs) and how to ingest data coming at different speeds. We also looked at various processing engines to process real-time and batch time series data before we can surface the processed data and analytics to applications and dashboards and share with the other teams. It is important to note that this is just one of the possible architectures. You can build a similar architecture using Azure Databricks or Azure HDInsight. But what we have presented here is a popular architecture that’s typically used by many companies.

In the next chapter, we will look at event-driven analytics using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning.

The rest of the chapter is locked

You have been reading a chapter from

Engineering Data Mesh in Azure Cloud

Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages