Reader small image

You're reading from  Cloud Scale Analytics with Azure Data Services

Product typeBook
Published inJul 2021
PublisherPackt
ISBN-139781800562936
Edition1st Edition
Right arrow
Author (1)
Patrik Borosch
Patrik Borosch
author image
Patrik Borosch

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch

Right arrow

Chapter 8: Streaming Data into Your MDWH

More and more analytical projects need to show real-time or near real-time data, that is, data that is coming from online systems such as shops and trading platforms or IoT telemetry. You want to collect and analyze that data maybe even right as it hits your system. IoT data might give you input about the status and potential failure of machines on your shop floor, or you may just seek to display online data of your production. Shop telemetry could inform you about potential customer churn, or trading events might be checked for fraudulent behavior. There are multiple use cases as well as options to implement them on the Microsoft Azure platform.

This chapter will inform you about Azure Stream Analytics (ASA) and the configuration-based approach that this service offers. ASA is a fully managed PaaS component. You will learn how to set up the service and how to connect to sources and targets. You will learn about SQL queries with windowing...

Technical requirements

To follow along with this chapter, you will need the following:

  • An Azure subscription where you have at least contributor rights or you are the owner
  • The right to provision ASA
  • A Synapse Spark pool (optional if you want to follow the additional options)
  • A Databricks cluster (optional if you want to follow the additional options)

Provisioning ASA

Now let's provision your first ASA job on Azure:

  1. To create your ASA environment in your Azure subscription, please navigate to your Azure portal and hit + Create a resource, and then type Stream Analytics in the following blade. From the quick results below the search field, select Stream Analytics job. On the following details blade, you can get a first glance at your options with ASA. Please hit Create on the blade to start the provisioning sequence.
  2. On the following blade, you will need to enter some basic information required to create your ASA job:

    Figure 8.1 – Basic information provisioning in ASA

    Please enter a job name and select the target subscription. You can either select an existing resource group or create a new one and select the data center location for your ASA job.

    The Cloud hosting environment will create an ASA job in your subscription as a cloud service. If you select Edge, the ASA job will be containerized and deployed to...

Implementing an ASA job

ASA offers you a convenient way to create streaming analysis on a configuration basis. This means you do not need to code the environment, the engine, the connection, logging, and so on. The service will take care of all these tasks for you (to see an example, refer to the Integrating sources and Writing to sinks sections that follow). The only thing you will need to code is the analytical core of your streaming job. To ease things for you, this is done using a SQL dialect that was tailored for this task (see Understanding ASA SQL).

After the provisioning of your new resource, you are taken to the following overview blade:

Figure 8.2 – Overview blade of the ASA job

You can already see three of the most important areas of your ASA job:

  • Inputs: This will show all the configured source connections available in your job.
  • Outputs: This will show all the configured target connections available in your job.
  • Query:...

Understanding ASA SQL

The main processing in your ASA job will be done using SQL to implement the analytical rules you want to apply to your incoming data.

Compared to data warehouse batch-oriented processing, stream processing observes a constantly delivered chain of events. The processing, therefore, will need different approaches as you will, for example, aggregate values over a certain recurring time frame. This is called windowing. The ASA SQL dialect implements a collection of windowing functions that will support you in doing this.

But before we dive into the magic of windowing functions and ASA, let's first finish our basic ASA job and kick it:

  1. Please select Query from either the navigation blade or the Overview blade and select Edit query:

    Figure 8.4 – ASA query editor

  2. In the editor, please enter your ASA query. Please replace the displayed query with the following:
    SELECT
       * 
    INTO
        airdelaystreamingtarget...

Using Structured Streaming with Spark

If you are more the kind of developer that loves to code and you are a fan of Spark, maybe you want to have a look at Structured Streaming with Spark. This might be an interesting alternative for you.

Spark clusters are a widely used engine to implement streaming analytics using one of the available programming languages, such as Python or Scala. With the massive scalability of Spark clusters in Azure services such as Synapse or Databricks, you will be able to implement an environment that can grow with your needs and deliver the necessary performance.

Next to performance, there is the extensibility of Spark clusters that is a factor to consider. You will be able to combine streaming algorithms with the capabilities of Spark and programming languages such as Python (PySpark), Scala, or R.

Take Kafka as input for your streaming analysis, for example. Kafka is an event streaming platform that is quite widely used. ASA does not yet offer...

Security in your streaming solution

Secure access to sources and sinks in your solution is paramount. There are some considerations that you might want to go through when implementing a streaming solution with ASA.

Connecting to sources and sinks

If you examine the Integrating sources and Writing to sinks sections, you will find the authentication mode in the list of the connectors in almost every one except Event Hubs and IoT Hub, where you would use key and connection strings to connect.

Implementing authentication with either service users and passwords or managed identities will already create very secure access into your sources and sinks. Azure Active Directory implements a multitude of security measures to eliminate the possibility for attackers to break into your solution.

With the use of managed identities, you are implementing a service principal. This is a kind of Azure user that can only be used with Azure services. You can compare them to the on-premises service...

Monitoring your streaming solution

As seen in Figure 8.9, you can see some information already on the Overview page of your ASA job. If you navigate to the Monitoring section of your ASA job, you can get further insights into your job.

In the Logs section, for example, you are presented with a list of predefined queries that will produce insights into all kinds of errors that can occur when you are running your ASA job:

Figure 8.16 – Available error queries in the Logs section

If you proceed to Metrics, you are taken to a chart editor where you can select from the available ASA metrics:

Figure 8.17 – ASA metrics view

You have metrics such as backlogged input events, data conversion errors, early input events, and failed function requests. This section will give you a deep insight into your ASA job.

If you want to set up alerts for your ASA job, such as the SU percentage utilization, for example (remember the Understanding...

Summary

In this chapter, you have learned how to provision an ASA job. You have seen how to connect to sources and sinks and how to use them as inputs and outputs. You have also learned about ASA SQL and its windowing functions.

Furthermore, you have seen that ASA SQL queries can route data from the input to different outputs, creating different granularities. You have examined the capabilities to add reference data to your queries and how to add further functionality such as user-defined functions and machine learning using functions.

Finally, we have talked about SUs, the performance metrics of ASA, and how partitioning will help you to improve performance. You have examined security questions and have learned about monitoring. If all the features of ASA do not deliver on your requirements, there are additional technologies available on Azure, such as Spark clusters in Synapse or Databricks that can be used to implement streaming.

We have touched on the topic of machine...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cloud Scale Analytics with Azure Data Services
Published in: Jul 2021Publisher: PacktISBN-13: 9781800562936
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Patrik Borosch

Patrik Borosch is a cloud solution architect for data and AI at Microsoft Switzerland GmbH. He has more than 25 years of BI and analytics development, engineering, and architecture experience and is a Microsoft Certified Data Engineer and a Microsoft Certified AI Engineer. Patrik has worked on numerous significant international data warehouse, data integration, and big data projects. Through this, he has built and extended his experience in all facets, from requirements engineering to data modeling and ETL, all the way to reporting and dashboarding. At Microsoft Switzerland, he supports customers in their journey into the analytical world of the Azure Cloud.
Read more about Patrik Borosch