Reader small image

You're reading from  Engineering Data Mesh in Azure Cloud

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781805120780
Edition1st Edition
Concepts
Right arrow
Author (1)
Aniruddha Deswandikar
Aniruddha Deswandikar
author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar

Right arrow

Event-Driven Analytics Using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

In the previous chapter, we discussed the four dimensions of data in modern-day data analytics – volume, velocity, variety, and veracity.

We also looked at an architecture that covers volume and variety. We left out the velocity part to make the architecture simpler. In this chapter, we will architect for the last V, velocity.

Data is dynamically changing in today’s fast-paced world. It’s a world of instant gratification and split-second actions. You want to understand how your customers are interacting with your website so that you can dynamically swap ads or provide dynamic, relevant offers. You want to monitor the machines you make and sell to predict failures and provide proactive service and maintenance. You want to detect banking fraud before it happens. There are many such scenarios across multiple industries that need real-time processing of data.

...

Requirements

Let’s look at a few examples of real-time events and data in some industries:

  • A retail company wants to monitor its customers’ behavior on its website in real time. It wants to see what its customers are clicking, browsing, and adding or removing from their baskets. This will help them understand customer behavior so that they can personalize ads and offers.
  • A manufacturing company wants to monitor the machines they sell for usage and health. They place multiple sensors on the machine that transmit data every second. This data is then put through machine learning models to predict failures.
  • A vineyard wants to monitor soil moisture, sunlight intensity, and other such parameters to ensure the right time for harvesting, preventing disease, and producing good quality wines. They place sensors on their vineyards to collect and process this data.
  • A city wants to monitor its traffic and public transportation data in real time to provide better...

Architecture

An architecture for implementing this real-time streaming of large volumes of data is depicted in Figure 16.1:

Figure 16.1 – Event-based analytics using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

Figure 16.1 – Event-based analytics using Azure Event Hubs, Azure Stream Analytics, and Azure Machine Learning

Take a closer look at this architecture; we’ll learn about its components and their functionality in the next section.

Components

Let’s look at the components of this architecture in greater detail.

Source data

Clickstream data can be collected using multiple software development kits (SDKs). These are typically JavaScript scripts that are embedded in the web pages that transmit click data to an API.

IoT data can be collected using a network of sensors connected to a gateway. The gateway can call an API in the analytical system (IoT Hub) to push the data.

Azure Event Hubs

Azure Event Hubs is a data streaming service that can scale to millions of messages per second. It is the preferred event ingestion service in Azure. It provides message/events queues to ingest and temporarily store messages/events from a producer until an event consumer pulls the event off the queue for processing. It also maintains a schema registry that the producer and consumer can refer to maintain interoperability. Event Hubs can be configured and scaled in many ways. For more details on Azure Event Hubs...

Data flow

  1. Clickstream data is sent to Azure Event Hub.
  2. IoT data is sent to Azure IoT Hub.
  3. Azure Event Hub and Azure IoT Hub push data to Azure Stream Analytics. Azure Stream Analytics processes the data for simple patterns and threshold values.
  4. Azure Data Explorer can pull data from Stream Analytics for time series analysis. Azure Data Explorer can also pull data directly from Event Hubs and/or IoT Hub.
  5. Azure Machine Learning can use data from Azure Data Explorer to build machine learning models. These models can stored as binary files in blob storage or a Data Explorer database and can be executed in Data Explorer to perform real-time inferencing.
  6. After analysis, data can be pushed to Cosmos DB. Cosmos DB provides high-performance storage for documents.
  7. Cosmos DB data can be consumed by mobile applications, websites, or other marketing or analytical applications.
  8. Power BI can build real-time dashboards from Stream Analytics.

This architecture...

Scenarios

Here are some scenarios to consider:

  • Customer behavior analytics: Analyze browsing and shopping trends, provide customized ads and offerings, and offer dynamic pricing
  • Detect fraud: Monitor credit card transactions in real time to detect fraud patterns and prevent them from happening
  • Predictive maintenance: Analyze data from machine sensors and detect future failures, provide better customer service, and adhere to uptime service-level agreements

Many other sectors, such as agriculture, retail, and smart city projects, can use real-time analytics to analyze streaming data to optimize their business and operations.

This brings us to the end of this architectural pattern. Now, let’s summarize what we’ve learned.

Summary

In this chapter, we covered the third “V” – velocity – in the “Four Vs” of data. We discussed the different real-time data sources and ways of ingesting them. We also looked at how to process this data in real time over smaller and larger periods. To analyze small chunks of streaming data, we used Azure Stream Analytics, while for time series analysis over larger periods of historic data, we used Azure Data Explorer. Optionally, machine learning models can be built on the Data Explorer data using Azure Machine Learning. Finally, data can be served in Cosmos DB or as real-time dashboards on Power BI.

In the next chapter, we will look at more recent technologies, such as generative AI, and the architecture needed to leverage this revolutionary AI technology.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Engineering Data Mesh in Azure Cloud
Published in: Mar 2024Publisher: PacktISBN-13: 9781805120780
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Aniruddha Deswandikar

Aniruddha Deswandikar holds a Bachelor's degree in Computer Engineering and is a seasoned Solutions Architect with over 30 years of industry experience as a developer, architect and technology strategist. His experience spans from start-ups to dotcoms to large enterprises. He has spent 18 years at Microsoft helping Microsoft customers build their next generation Applications and Data Analytics platforms. His experience across Application, Data and AI has helped him provide holistic guidance to companies large and small. Currently he is helping global enterprises set up their Enterprise-scale Analytical system using the Data Mesh Architecture. He is a Subject Matter Expert on Data Mesh in Microsoft and is currently helping multiple Microsoft Global Customers implement the Data Mesh architecture.
Read more about Aniruddha Deswandikar