You're reading from Cloud Scale Analytics with Azure Data Services

Product typeBook

Published inJul 2021

PublisherPackt

ISBN-139781800562936

Edition1st Edition

Tools

Azure Pack

Concepts

Data Streaming

Author (1)

Patrik Borosch

Chapter 8: Streaming Data into Your MDWH

More and more analytical projects need to show real-time or near real-time data, that is, data that is coming from online systems such as shops and trading platforms or IoT telemetry. You want to collect and analyze that data maybe even right as it hits your system. IoT data might give you input about the status and potential failure of machines on your shop floor, or you may just seek to display online data of your production. Shop telemetry could inform you about potential customer churn, or trading events might be checked for fraudulent behavior. There are multiple use cases as well as options to implement them on the Microsoft Azure platform.

This chapter will inform you about Azure Stream Analytics (ASA) and the configuration-based approach that this service offers. ASA is a fully managed PaaS component. You will learn how to set up the service and how to connect to sources and targets. You will learn about SQL queries with windowing...

Technical requirements

To follow along with this chapter, you will need the following:

An Azure subscription where you have at least contributor rights or you are the owner
The right to provision ASA
A Synapse Spark pool (optional if you want to follow the additional options)
A Databricks cluster (optional if you want to follow the additional options)

Provisioning ASA

Now let's provision your first ASA job on Azure:

To create your ASA environment in your Azure subscription, please navigate to your Azure portal and hit + Create a resource, and then type Stream Analytics in the following blade. From the quick results below the search field, select Stream Analytics job. On the following details blade, you can get a first glance at your options with ASA. Please hit Create on the blade to start the provisioning sequence.
On the following blade, you will need to enter some basic information required to create your ASA job:
Figure 8.1 – Basic information provisioning in ASA
Please enter a job name and select the target subscription. You can either select an existing resource group or create a new one and select the data center location for your ASA job.
The Cloud hosting environment will create an ASA job in your subscription as a cloud service. If you select Edge, the ASA job will be containerized and deployed to...

Implementing an ASA job

ASA offers you a convenient way to create streaming analysis on a configuration basis. This means you do not need to code the environment, the engine, the connection, logging, and so on. The service will take care of all these tasks for you (to see an example, refer to the Integrating sources and Writing to sinks sections that follow). The only thing you will need to code is the analytical core of your streaming job. To ease things for you, this is done using a SQL dialect that was tailored for this task (see Understanding ASA SQL).

After the provisioning of your new resource, you are taken to the following overview blade:

Figure 8.2 – Overview blade of the ASA job

You can already see three of the most important areas of your ASA job:

Inputs: This will show all the configured source connections available in your job.
Outputs: This will show all the configured target connections available in your job.
Query:...

Understanding ASA SQL

The main processing in your ASA job will be done using SQL to implement the analytical rules you want to apply to your incoming data.

Compared to data warehouse batch-oriented processing, stream processing observes a constantly delivered chain of events. The processing, therefore, will need different approaches as you will, for example, aggregate values over a certain recurring time frame. This is called windowing. The ASA SQL dialect implements a collection of windowing functions that will support you in doing this.

But before we dive into the magic of windowing functions and ASA, let's first finish our basic ASA job and kick it:

Please select Query from either the navigation blade or the Overview blade and select Edit query:
Figure 8.4 – ASA query editor
In the editor, please enter your ASA query. Please replace the displayed query with the following:
```
SELECT
   * 
INTO
    airdelaystreamingtarget...
```

Using Structured Streaming with Spark

If you are more the kind of developer that loves to code and you are a fan of Spark, maybe you want to have a look at Structured Streaming with Spark. This might be an interesting alternative for you.

Spark clusters are a widely used engine to implement streaming analytics using one of the available programming languages, such as Python or Scala. With the massive scalability of Spark clusters in Azure services such as Synapse or Databricks, you will be able to implement an environment that can grow with your needs and deliver the necessary performance.

Next to performance, there is the extensibility of Spark clusters that is a factor to consider. You will be able to combine streaming algorithms with the capabilities of Spark and programming languages such as Python (PySpark), Scala, or R.

Take Kafka as input for your streaming analysis, for example. Kafka is an event streaming platform that is quite widely used. ASA does not yet offer...

Security in your streaming solution

Secure access to sources and sinks in your solution is paramount. There are some considerations that you might want to go through when implementing a streaming solution with ASA.

Connecting to sources and sinks

If you examine the Integrating sources and Writing to sinks sections, you will find the authentication mode in the list of the connectors in almost every one except Event Hubs and IoT Hub, where you would use key and connection strings to connect.

Implementing authentication with either service users and passwords or managed identities will already create very secure access into your sources and sinks. Azure Active Directory implements a multitude of security measures to eliminate the possibility for attackers to break into your solution.

With the use of managed identities, you are implementing a service principal. This is a kind of Azure user that can only be used with Azure services. You can compare them to the on-premises service...

Monitoring your streaming solution

As seen in Figure 8.9, you can see some information already on the Overview page of your ASA job. If you navigate to the Monitoring section of your ASA job, you can get further insights into your job.

In the Logs section, for example, you are presented with a list of predefined queries that will produce insights into all kinds of errors that can occur when you are running your ASA job:

Figure 8.16 – Available error queries in the Logs section

If you proceed to Metrics, you are taken to a chart editor where you can select from the available ASA metrics:

Figure 8.17 – ASA metrics view

You have metrics such as backlogged input events, data conversion errors, early input events, and failed function requests. This section will give you a deep insight into your ASA job.

If you want to set up alerts for your ASA job, such as the SU percentage utilization, for example (remember the Understanding...

Summary

In this chapter, you have learned how to provision an ASA job. You have seen how to connect to sources and sinks and how to use them as inputs and outputs. You have also learned about ASA SQL and its windowing functions.

Furthermore, you have seen that ASA SQL queries can route data from the input to different outputs, creating different granularities. You have examined the capabilities to add reference data to your queries and how to add further functionality such as user-defined functions and machine learning using functions.

Finally, we have talked about SUs, the performance metrics of ASA, and how partitioning will help you to improve performance. You have examined security questions and have learned about monitoring. If all the features of ASA do not deliver on your requirements, there are additional technologies available on Azure, such as Spark clusters in Synapse or Databricks that can be used to implement streaming.

We have touched on the topic of machine...

Event Hubs: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-about
IoT Hub: https://docs.microsoft.com/en-us/azure/iot-hub/about-iot-hub
Service Bus: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-queues-topics-subscriptions
Sinks: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
Optimizing Azure Stream Analytics: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
Window functions: https://docs.microsoft.com/en-us/stream-analytics-query/windows-azure-stream-analytics
Job recovery: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-concepts-checkpoint-replay
Reference data: https://docs.microsoft.com/en-us/stream-analytics-query/reference-data-join-azure-stream-analytics
Joins: https://docs.microsoft.com/en-us/stream-analytics-query/join-azure-stream-analytics
MATCH_RECOGNIZE: https://docs.microsoft...

The rest of the chapter is locked

You have been reading a chapter from

Cloud Scale Analytics with Azure Data Services

Published in: Jul 2021Publisher: PacktISBN-13: 9781800562936

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Patrik Borosch

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch

Other recommended products

Related to this chapter

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

BookJun 2021392 pages

Azure Data Engineering Cookbook

This book will help you design and implement modern ETL workflows along with data management, monitoring, and security aspects to meet the current organization's needs. You will use various services such as Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Azure Data Explorer to design efficient data processing solutions.

BookApr 2021454 pages

Azure Data Factory Cookbook

With the help of well-structured and practical recipes, this book will teach you how to integrate data from the cloud and on-premise. You’ll learn how to transform, clean, and consolidate data into a single data platform and get to grips with using ADF as the main ETL and orchestration tool for your data warehouse or data platform project.

BookDec 2020382 pages

Azure Databricks Cookbook

The Azure Databricks Cookbook shows you how to work with the latest as well as older versions of Apache Spark and integrate with various Azure resources for orchestrating, deploying, and monitoring big data solutions. You'll use Azure Databricks to build end-to-end solutions and address challenges in securing, productionizing, and monitoring them.

BookSep 2021452 pages

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

BookJul 2021428 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

BookJan 2021184 pages

Hands-On Data Warehousing with Azure Data Factory

Azure Data Factory (ADF) is a Microsoft Azure PaaS solution which supports data movement between many on premises and cloud data sources. This book covers custom tailored tutorials to help you develop , maintain and troubleshoot data movement processes and environments using Azure Data Factory V2 and SQL Server Integration Services 2017

BookMay 2018284 pages

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

BookNov 2019242 pages

Stream Analytics with Microsoft Azure

This book is your guide to understanding the basics of how Azure Stream Analytics works, and build your own analytics solution using its capabilities. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution which can work with any type of data.

BookDec 2017322 pages

ETL with Azure Cookbook

This book will take you through hand-on recipes for extracting, transforming, and loading data using big data tools and Azure services such as Data Factory and Azure Databricks. You will learn how to interact effectively with Azure services, along with covering automation with BIML and data profiling in Azure.

BookSep 2020446 pages

Distributed Data Systems with Azure Databricks

This book helps you to learn how to extract, transform, and orchestrate massive amounts of data to develop robust data pipelines. You'll perform complex machine learning tasks using advanced Azure Databricks features, and also explore model tuning, deployment, and control using Databricks functionalities such as AutoML and Delta Lake with TensorFlow.

BookMay 2021414 pages

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

BookApr 2020488 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Cloud Scale Analytics with Azure Data Services

Chapter 8: Streaming Data into Your MDWH

Technical requirements

Provisioning ASA

Implementing an ASA job

Understanding ASA SQL

Using Structured Streaming with Spark

Security in your streaming solution

Connecting to sources and sinks

Monitoring your streaming solution

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Limitless Analytics with Azure Synapse

This book helps you understand the basic concepts and techniques of using Azure Synapse step-by-step. You'll gradually gain the skills you need to work with data and develop analytics solutions using the Azure analytics platform even with no prior knowledge of Azure.

Azure Data Engineering Cookbook

Azure Data Factory Cookbook

Azure Databricks Cookbook

Data Modeling for Azure Data Services

Data modeling for Azure Data Services teaches you the core concepts of setting up different types of databases for different use cases. With this hands-on guide, you'll learn how to implement the resulting data model in Azure efficiently.

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure enables you to understand the design and business considerations that you must keep in mind while planning to adopt the cloud analytics model for your business.

Hands-On Data Warehousing with Azure Data Factory

Cloud Analytics with Microsoft Azure

Cloud Analytics with Microsoft Azure is an end-to-end guide to processing and analyzing big data using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data.

Stream Analytics with Microsoft Azure

ETL with Azure Cookbook

Distributed Data Systems with Azure Databricks

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook