Home Data Scalable Data Analytics with Azure Data Explorer

Scalable Data Analytics with Azure Data Explorer

By Jason Myerscough
books-svg-icon Book
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $43.99 $29.99
Print $54.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Introducing Azure Data Explorer
About this book
Azure Data Explorer (ADX) enables developers and data scientists to make data-driven business decisions. This book will help you rapidly explore and query your data at scale and secure your ADX clusters. The book begins by introducing you to ADX, its architecture, core features, and benefits. You'll learn how to securely deploy ADX instances and navigate through the ADX Web UI, cover data ingestion, and discover how to query and visualize your data using the powerful Kusto Query Language (KQL). Next, you'll get to grips with KQL operators and functions to efficiently query and explore your data, as well as perform time series analysis and search for anomalies and trends in your data. As you progress through the chapters, you'll explore advanced ADX topics, including deploying your ADX instances using Infrastructure as Code (IaC). The book also shows you how to manage your cluster performance and monthly ADX costs by handling cluster scaling and data retention periods. Finally, you'll understand how to secure your ADX environment by restricting access with best practices for improving your KQL query performance. By the end of this Azure book, you'll be able to securely deploy your own ADX instance, ingest data from multiple sources, rapidly query your data, and produce reports with KQL and Power BI.
Publication date:
March 2022
Publisher
Packt
Pages
364
ISBN
9781801078542

 

Chapter 1: Introducing Azure Data Explorer

Welcome to Scalable Data Analytics with Azure Data Explorer! More than 90% of today's data is digital and most of that data is considered unstructured, such as text messages and other forms of free text. So how can we analyze all our data? The answer is data analytics and Azure Data Explorer (ADX). Data analytics is a complex topic and Microsoft Azure provides a comprehensive selection of data analytics services, which can seem overwhelming when you are first starting your journey into data analytics.

In this chapter, we begin by introducing the data analytics pipeline and learning about each of the steps in the pipeline. These steps are required for taking raw data and producing reports and visuals as a result of your analysis, which will help you understand the workflow used by ADX.

Next, we will introduce some of the popular Azure data services and understand where they fit in the data analytics pipeline. Some of these services, such as Azure Event Hubs, will be used in later chapters when we learn about data ingestion.

We will also learn what ADX is, the features that make it a powerful data exploration platform, the architecture, and key components of ADX, such as the engine cluster, and understand some of the use cases for ADX, for example, in IoT monitoring, telemetry, and log analysis. Finally, we will get our feet wet and dive right into running your first Kusto Query Language (KQL) query using the Data Explorer UI.

In this chapter, we are going to cover the following main topics:

  • Introducing the data analytics pipeline
  • What is Azure Data Explorer?
  • Azure Data Explorer use cases
  • Running your first query
 

Technical requirements

If you do not already have an Azure account, head over to https://azure.microsoft.com/en-us/free/search/ and sign up. Microsoft provides 12 months of popular free services and $200 credit, which is enough to cover the cost of our Azure Data Explorer journey with this book. Microsoft also provides a free to use cluster (https://help.kusto.windows.net/) that is already populated with data. We will use this free cluster and create our own clusters throughout this book.

Please remember to clone or download the Git repository that accompanies the book from https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer. All the code and query samples listed in the book are available in our repository. Download the latest version of Git from https://git-scm.com if you have not already installed the command-line tools.

Important Note

When developing and cloning repositories, I create a development folder in my home directory. On Windows, this is C:\Users\jason\development. On macOS, this is /Users/jason/development. When referencing specific code examples, I will refer to the repository's parent directory as ${HOME}, for example, ${HOME}/Scalable-Data-Analytics-with-Azure-Data-Explorer/Chapterxx/file.kql.

 

Introducing the data analytics pipeline

Before diving into ADX, it is worth spending some time to understand the data analytics pipeline. Whenever I am learning something new that is large and complex in scope, such as data analytics, I break the topic down into smaller chunks to help with learning and measuring my progress. Therefore, an understanding of the various stages of the data analytics pipeline will help you understand how ADX takes raw data and generates reports and visuals as a result of our analytical tasks, such as time series analysis.

Figure 1.1 illustrates the stages of the data analytics pipeline required to take data from a data source, perform some analysis, and produce the result of the analysis in the form of a visual, such as tables, reports, and graphs:

Figure 1.1 – Data analytics pipeline

Figure 1.1 – Data analytics pipeline

In the spirit of breaking a complex subject into smaller chunks, let's look at each stage in detail:

  1. Data: The first step in the pipeline is the data sources. In Chapter 4, Ingesting Data in Azure Data Explorer, we will discuss the different types of data. For now, suffice it to say there are three different categories of data: structured, semi-structured, and unstructured. Data can range from structured, such as tables, to unstructured, such as free-form text.
  2. Ingestion: Once the data sources have been identified, the data needs to be ingested by the pipeline. The primary purpose of the ingestion stage is to take the raw data, perform some Extract-Transform-Load (ETL) operations to format the data in a way that helps with your analysis, and send the data to the storage stage. The data can be ingested using tools and services such as Apache Kafka, Azure Event Hubs, and IoT Hub. Chapter 4, Ingesting Data in Azure Data Explorer, discusses the different ingestion methods, such as streaming versus batch, and demonstrates how to ingest data using multiple services, such as Azure Event Hubs and Azure Blob storage.
  3. Store: Once ingested, ADX natively compresses and stores the data in a proprietary format. The data is then cached locally on the cluster based on the hot cache settings. The data is phased out of the cluster based on the retention settings. We will discuss these terms a little later in the chapter.
  4. Analyze: At this stage, we can start to query, apply machine learning to detect anomalies, and predict trends. We will see examples of anomaly detection and trend prediction in Chapter 7, Identifying Patterns, Anomalies, and Trends in Your Data. In this book, we will perform most of our analysis in the ADX Web UI using Kusto Query Language (KQL).
  5. Visualize: The final stage of the pipeline is visualize. Once you have ingested your data and performed your analysis, chances are you will want to share and present your findings. We will present our findings using the ADX Web UI's dashboards and Power BI.

In the next section, we will look at some of the services Azure provides for the different stages of the analytics pipeline.

Overview of Azure data analytics services

You may have noticed that I referenced a few of Azure's data services previously, and you may be wondering what they are used for. Although this book is about Azure Data Explorer, it is worth understanding what some of the common data services are, since some of the services, such as Event Hubs and Blob storage, will be discussed and used in later chapters.

To help map the different data services to the analytics pipeline, Figure 1.2 illustrates an updated pipeline, with the Azure data services mapped to the respective pipeline stages:

Figure 1.2 – Azure data services

Figure 1.2 – Azure data services

Important Note

The list of services depicted in Figure 1.2 is by no means an exhaustive list of Azure data analytics services. For a complete and accurate list, please see https://azure.microsoft.com/en-us/services/#analytics.

The following list of services is a short description of the services shown in Figure 1.2:

  • Event Hubs: This is an event and streaming Platform as a Service (PaaS). Event Hubs allows us to stream data, which we will demonstrate and use in Chapter 4, Ingesting Data in Azure Data Explorer.
  • Data Factory: This is a PaaS service that allows us to transform data from one format to another. These transformations are commonly referred to as Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT).
  • HDInsight: This is a PaaS service that appears twice in Figure 1.2 and could technically appear in other stages. HDInsight is quite possibly one of the most misunderstood analytical services, with regard to what it does. HDInsight is a PaaS version of the Hortonworks Hadoop framework, which includes a wide range of ingestion, analytics, and storage services, such as Apache Kafka, Hive, HBase, Spark, and the Hadoop Distributed File System (HDFS).
  • Azure Data Lake Gen2: This is a storage solution based on Azure Blob storage that implements HDFS.
  • Blob Storage: This is Azure's object storage service that all other storage services are based on.
  • Azure Databricks: This is Azure's PaaS implementation of Apache Spark.
  • Power BI: Technically not an Azure service, Power BI is a rich reporting product that is commonly integrated with Azure.

You may be wondering where ADX would fit in Figure 1.2. The answer is ingestion, store, analyze, and visualize. In the next section, you will learn how this is possible by understanding what Azure Data Explorer is.

 

What is Azure Data Explorer?

There is a good chance you have already used ADX to some degree without realizing it. If you have used Azure Security Center, Azure Sentinel, Application Insights, Resource Graph Explorer, or enabled diagnostics on your Azure resources, then you have used ADX. All these services rely on Log Analytics, which is built on top of ADX.

Like many tools and products, ADX was started by a small group of engineers circa 2015 who were trying to solve a problem. A small group of developers from Microsoft's Power BI team needed a high-performing big data solution to ingest and analyze their logging and telemetry data, and being engineers, they built their own when they could not find a service that met their needs. This resulted in the creation of Azure Data Explorer, also known as Kusto.

So, what is ADX? It is a fully managed, append-only columnar store big data service capable of elastic scaling and ingesting literally hundreds of billions of records daily!

Before moving onto the ADX features, it is important to understand what is meant by PaaS and the other cloud offerings referred to as as a service. Understanding the different cloud offerings will help with understanding what you and the cloud provider – in our case, Microsoft – are responsible for.

When you strip away the marketing terms, cloud computing is essentially a data center that is managed for you and has the same layers or elements as an on-premises data center, for example, hardware, storage, and networking.

Figure 1.3 shows the common layers and elements of a data center. The items in white are managed by you, the customer, and the items in gray are managed by the cloud provider:

Figure 1.3 – Cloud offerings

Figure 1.3 – Cloud offerings

In the case of on-premises, you are responsible for everything, from renting the building and ventilation to physical networking and running your applications. Public cloud providers offer three fundamental cloud offerings, known as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The provider typically offers a lot more services, such as Azure App Service, but these additional services are built on top of the aforementioned fundamental services.

In the case of ADX, which is a PaaS service, Microsoft manages all layers except the data and application. You are responsible for the data layer, that is, the data ingestion, and the application layer, that is, writing our KQL and creating dashboards.

ADX features

Let's look at some of the key features ADX provides. Most of the features will be discussed in detail in later chapters:

  • Low-latency ingestion and elastic scaling: ADX nodes are capable of ingesting structured, semi-structured, and unstructured data up to speeds of 200 MBps (megabytes per second). The vertical and horizontal scaling capabilities of ADX enable it to ingest petabytes of data.
  • Time series analysis: As we will see in Chapter 7, Identifying Patterns, Anomalies, and Trends in Your Data, ADX supports near real-time monitoring, and combined with the powerful KQL, we can search for anomalies and trends within our data.
  • Fully managed (PaaS): All the infrastructure, operating system patching, and software updates are taken care of by Microsoft. You can focus on developing your product rather than running a big data platform. You can be up and running in three steps:
    • Create a cluster and database (more details in Chapter 2, Building Your Azure Data Explorer Environment).
    • Ingest data (more details in Chapter 4, Ingesting Data in Azure Data Explorer).
    • Explore your data using KQL (more details in Chapter 5, Introducing the Kusto Query Language).
  • Cost-efficient: Like other Azure services, Microsoft provides a pay-as-you-consume model. For more advanced use cases, there is also the option of purchasing reserved instances, which require upfront payments.
  • High availability: Microsoft provides an uptime SLA of 99.9% and supports Availability Zones, which ensures your infrastructure is deployed across multiple physical data centers within an Azure region.
  • Rapid ad hoc query performance: Due to some of the architecture decisions that are discussed in the next section, ADX is capable of querying billions of records containing structured, semi-structured, and unstructured data, returning results within seconds. ADX is also designed to execute distributed queries across multiple clusters, which we will see later in the book.
  • Security: We will be covering security in depth in Chapter 10, Azure Data Explorer Security. For now, suffice it to say that ADX supports both encryption at rest and in transit, role-based access control (RBAC), and allows you to restrict public access to your clusters by deploying them into virtual private networks (VPNs) and block traffic using network security groups (NSGs).
  • Enables custom solutions: Allows developers to build analytics services on top of ADX.

If you are familiar with database products such as MySQL, MS SQL Server, and Azure SQL, then the core components will be familiar to you. ADX uses the concept of clusters, which can be considered equivalent to Azure SQL Server and are essentially the compute or virtual machines. Next, we have databases and tables; these concepts are the same as a SQL database.

Figure 1.4 shows the hierarchical structure that is shown in the Data Explorer UI. In this example, help is the ADX cluster and Samples is in the database, which contains multiple tables such as US_States:

Figure 1.4 – Cluster, database, and tables hierarchy

Figure 1.4 – Cluster, database, and tables hierarchy

A cluster or SQL server can host multiple databases, which in turn can contain multiple tables (see Figure 1.4). We will discuss tables in Chapter 4, Ingesting Data in Azure Data Explorer, when we will demonstrate how to create tables and data mappings.

Introducing Azure Data Explorer architecture

PaaS services are great because they allow developers to get started quickly and focus on their product rather than managing complex infrastructure. Being fully managed can also be a disadvantage, especially when you experience issues and need to troubleshoot, and as engineers, we tend to be curious and want to understand how things work.

As depicted in Figure 1.5, ADX contains two key services, the data management service and the engine service. Both services are clusters of compute resources that can be automatically or manually scaled horizontally and vertically. At the time of writing, Microsoft recently announced their V3 engine in March 2021, which contains some significant performance improvements:

Figure 1.5 – Azure Data Explorer architecture

Figure 1.5 – Azure Data Explorer architecture

Now, let's learn more about the data management and the engine service depicted in the preceding diagram:

  • Data management service: The data management service is responsible primarily for metadata management and managing the data ingestion pipelines. The data management service ensures data is properly ingested and sent to the engine service. Data that is streamed to our cluster is sent to the row store, whereas data that is batched is sent to the column stores.
  • Engine service: The engine service, which is a cluster of compute resources, is responsible for processing the ingested data, managing the hot cache and the long-term storage, and query execution. Each engine uses its local SSD as the hot cache and ensures the cache is used as much as possible.

ADX is often referred to as an append-only analytics service, since the data that is ingested is stored in immutable shards and each shard is compressed for performance reasons. Data sharding is a method of splitting data into smaller chunks. Since the data is immutable, the engine nodes can safely read the data shards, knowing they do not have to worry about other nodes in the cluster making changes to the data.

Since the storage and the compute are decoupled, ADX can scale the cluster both vertically and horizontally without worrying too much about data management.

This brief overview of the architecture only scratches the surface; there are a lot more tasks happening, such as indexing columns and maintenance of the indexes. Having an overview helps appreciate what ADX is doing under the hood.

Important Note

I recommend reading the Azure Data Explorer white paper https://azure.microsoft.com/mediahandler/files/resourcefiles/azure-data-explorer/Azure_Data_Explorer_white_paper.pdf if you are interested in learning more about the architecture.

 

Azure Data Explorer use cases

Whenever someone asks what they should focus on when learning how to use Azure, I immediately say KQL. I use KQL daily, from managing cost and inventory to security and troubleshooting. It is not uncommon for relatively small environments to generate hundreds of GB of data per day, such as infrastructure diagnostics, Azure Resource Manager (ARM) audit logs, user audit logs, application logs, and application performance data. This may seem small in the grand scheme of things when, in 2021, we are generating quintillion bytes of data per day. But it is still enough data to require dedicated services such as ADX to analyze the data.

IoT monitoring and telemetry

Look around at your environment: how many appliances and devices can you see that are connected to the network? I see light bulbs, sensors, thermostats, and fire alarms, and there are billions of Internet of Things (IoT) devices in the world, all of which are constantly generating data. Together with Azure's IoT services, ADX can ingest the high volumes of data and enable us to monitor our things and perform complex time series analysis, so that we can identify anomalies and trends in our data.

Log analysis

Imagine this scenario: you have just performed a lift-and-shift migration to Azure for your on-premises product, and since the application is not a true cloud-native solution, you are constrained by which Azure services you can use, such as load balancing. Azure Application Gateway, which is a load-balancing service, supports cookie-based session affinity, and the cookies are completely managed by Application Gateway. The application we migrated to Azure required specific values to be written in the cookie, and this is not possible with the current version of Application Gateway, so we used HAProxy running on Linux virtual machines. The security team requires all products to only support TLS 1.2 and above. The problem is that not all of our clients support TLS 1.2, and if we simply disabled TLS 1.0 and 1.1, we would essentially break the service for those clients, which we do not want to do. Add to the equation the server-side product, which is distributed across 15 Azure Regions worldwide with each region containing hundreds of the HAProxy servers with no central logging! How can we analyze all this data to identify the clients that are not using TLS 1.2? The answer is Kusto.

We ingested the HAProxy log files and used KQL to analyze the log files and capture insights on TLS versioning and cipher information in seconds. With the queries, we were able to build near real-time dashboards for the support teams so they could reach out to clients and inform them when they would need to upgrade their software. With these insights, we were able to coordinate the TLS deprecation activities and execute them with no customer impact.

Most of the examples in this book focus on logging scenarios, and in Chapter 7, Identifying Patterns, Anomalies, and Trends in Your Data, we will learn about ADX's time series analysis features to identify patterns, anomalies, and trends in our data.

 

Running your first query

In this section, we are going to clone the Git repository, connect to an example ADX cluster called https://help.kusto.windows.net, which is provided by Microsoft, execute our first KQL query, and generate a bar chart showing population data per state in the US:

  1. If you are using Windows, open a new PowerShell terminal; if you are using macOS, then open a new Terminal or a Shell if you are using Linux. Clone the accompanying Git repository by typing git clone https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.git.
  2. Open your browser and go to https://dataexplorer.azure.com. The URL will take you to the ADX UI, the interactive environment for querying your data. Do not worry about the panes and layout. The ADX UI will be discussed in more detail in Chapter 3, Exploring the Azure Data Explorer UI. If you are prompted, log in with your Microsoft account. If your browser gets stuck in a refresh/redirect loop, please ensure you have allowed third-party cookies and check your SameSite cookie settings. You need to allow cookies from portal.azure.com and dataexplorer.azure.com. This issue is rare and only happened once when using Safari on macOS. Microsoft added a nice convenient feature whereby the web UI automatically connects to the help cluster (https://help.kusto.windows.net/). You can skip steps 3 and 4 if you see the help cluster in the cluster pane.
  3. Once you are logged in, the next step is to connect to an ADX cluster. Microsoft provides an example cluster, which we will be using throughout the book alongside the cluster we will create in Chapter 2, Building Your Azure Data Explorer Environment.

Click Add Cluster, as shown in Figure 1.6, and enter https://help.kusto.windows.net as the connection URL. The Display Name field will be populated with the cluster name, in this instance, help.

Figure 1.6 – Connecting to ADX clusters

Figure 1.6 – Connecting to ADX clusters

  1. Once connected, the help cluster will appear in the cluster pane, which is located below the Add Cluster button, as shown in Figure 1.7:
Figure 1.7 – Connected to the help cluster

Figure 1.7 – Connected to the help cluster

  1. Expand the cluster and click the Samples database to set the scope to @help/Samples, as shown in Figure 1.8:
Figure 1.8 – Expand the cluster and set the scope

Figure 1.8 – Expand the cluster and set the scope

  1. Next, click File | Open, as shown in Figure 1.9, and open ${HOME}/Scalable-Data-Analytics-with-Azure-Data-Explorer/Chapter01/first-query.kql:
Figure 1.9 – Open query in ADX

Figure 1.9 – Open query in ADX

  1. Finally, click Run, and you should see the population data, as shown in Figure 1.10. Even though the dataset is relatively small, ADX sorted the states by population in descending order and rendered the result as a bar chart in under 1 second:
Figure 1.10 – US state population rendered as a bar chart

Figure 1.10 – US state population rendered as a bar chart

This section was just a high-level introduction to the ADX UI and KQL. Don't worry if you did not understand everything at this point; we will discuss all the topics in more detail throughout the remainder of the book.

 

Summary

Congratulations on completing your first steps in learning about Azure Data Explorer! In this chapter, you learned about the different stages of the data analytics pipeline. Understanding the stages of the pipeline helps simplify your ability to comprehend the workflow of taking raw data and performing analysis on the data and visualizing your findings.

We then introduced some of the popular Azure data analytics services and mapped them to the different stages of the data analytics pipeline. Some of the services, such as Event Hubs, will be used in later chapters to ingest data into our own ADX databases.

We then learned what ADX is, what the main features are, and briefly looked at the ADX architecture to understand how ADX provides excellent performance by using both column stores and row stores, and how ADX scales both vertically and horizontally efficiently by implementing one of the fundamental Azure design principles of decoupling compute and storage. We then discussed some of the use cases of ADX that we will use throughout this book, such as time series analysis.

Finally, we learned how to connect to ADX clusters and query databases using the ADX UI. In the next chapter, we will learn how to create and manage our own ADX clusters and databases using the Azure portal, PowerShell, and the Azure CLI.

Before moving on to the next chapter, try modifying ${HOME}/Scalable-Data-Analytics-with-Azure-Data-Explorer/Chapter01/first-query.kql and display an area chart. The solution can be found at ${HOME}/Scalable-Data-Analytics-with-Azure-Data-Explorer/Chapter01/population-areachart.kql. What other types of charts can you render?

Additionally, here is some information you should know. The Azure Data Explorer UI supports a feature known as IntelliSense, as shown in Figure 1.11. IntelliSense provides code completion and hints when you are writing your queries, so you do not need to worry about memorizing all the keywords:

Figure 1.11 – IntelliSense features

Figure 1.11 – IntelliSense features

We will be using IntelliSense throughout this book when using both Visual Studio Code and the Azure Data Explorer Web UI. Visual Studio Code will be used for editing our scripts and ARM templates, and the Azure Data Explorer Web UI is where we will execute most of our KQL queries.

About the Author
  • Jason Myerscough

    Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.

    Browse publications by this author
Scalable Data Analytics with Azure Data Explorer
Unlock this book and the full library FREE for 7 days
Start now