You're reading from Scalable Data Analytics with Azure Data Explorer

Product typeBook

Published inMar 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781801078542

Edition1st Edition

Languages

Python

Concepts

Big Data

Author (1)

Jason Myerscough

Chapter 11: Performance Tuning in Azure Data Explorer

Azure Data Explorer (ADX) is designed for high performance without the need for performance maintenance activities. However, it can still experience slow performance when overwhelmed by the workload. Therefore, it is important to understand performance tuning to ensure we maintain the high performance we know ADX delivers. In the examples we have seen so far, we have not had to worry about performance. Our datasets have been relatively small and even with the larger datasets we used on the help cluster, performance has not been an issue. With that said, as you make your cluster available to end users so that they can run queries and generate reports, their usage patterns and the queries they write can collectively impact performance.

In this chapter, we will begin by introducing performance tuning. Then, we will introduce workload groups, learn how they work, and how they can help preserve cluster performance. We will also create...

Technical requirements

The code examples for this chapter can be found in the Chapter11 folder of this book's GitHub repository: https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.git.

Introducing performance tuning

Before we jump into workload groups, let's spend a few moments thinking about performance tuning in general. In general, performance should not be an issue, given that ADX has been designed and optimized to be a big data service that is highly scalable and fast. As you ingest more and more data and allow more users and applications to query your clusters, you may experience some performance degradation. Therefore, it is important to beware of performance tuning concepts and what features ADX provides to help tune performance when the time comes.

Like troubleshooting, which we discussed in Chapter 9, Monitoring and Troubleshooting Azure Data Explorer, performance tuning can be considered as a process. The goal of performance tuning is to identify bottlenecks, troubleshoot their causes, and apply the features that are available to us, such as workload groups, cache policies, and so on, to eliminate bottlenecks. It is also important to understand...

Introducing workload groups

I remember working on a big data project where we had a wide range of end users and applications using our clusters. At one end of the spectrum, we had engineers executing ad hoc queries to analyze application logs, while at the other end, we had product management and customer support teams running complex reports by using integrations into third-party tools, such as Power BI, to gain insights into usage patterns and statistics. At the end of each month, the team would start to receive phone calls and tickets related to query and job performance. Users were complaining that their jobs were either not running or timing out. It turned out that the customer support team was running jobs and reports to generate billing information and that these jobs were resource-intensive and would consume all the resources, causing other jobs to be queued or time out. The only way to resolve the issue was to log into the cluster and kill the long-running tasks.

Managing...

Introducing policy management

ADX supports performance tuning via policies. These policies allow us to configure individual components of our cluster such as caching, ingestion, and retention. As you may recall from Chapter 2, Building Your Azure Data Explorer Environment, a lot of these settings can be set at the management plane level. The great benefit of policies is that we can configure these individual components without having to authorize contributor access at the management plane level. Administrators of the ADX databases, at the data plane level, can configure these policies.

In this section, we will demonstrate how to configure caching and retention policies using KQL management commands. For a complete list of policies that can be configured, please see the ADX documentation: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/. The general syntax for managing policies is the same for all policies, so once you know how to configure one, configuring...

Monitoring queries

At the beginning of this chapter, in the Introducing performance tuning section, we learned that performance tuning is a process and that one of the steps is to identify the root cause of performance bottlenecks. We also saw in Chapter 9, Monitoring and Troubleshooting Azure Data Explorer, that the ADX Insights page in the Azure Portal provides out-of-the-box dashboards for performance and other telemetry, which is are very useful.

We can use KQL management commands to view all the commands and queries that have been executed by our ADX cluster. These KQL management commands provide valuable insights into CPU consumption, query execution duration, the query/command being executed, and the state of the query's execution. The following query returns all the queries, sorted by execution duration, which have been executed on our ADX cluster:

.show queries 
| project Text, Database, StartedOn, Duration, State, FailureReason, TotalCpu, CacheStatistics.Disk.Misses...

KQL best practices

In this section, we will discuss some of the best practices that you should take into consideration when developing your KQL queries.

Version controlling your queries

The first recommendation is to version control all your queries, any scripts, and so on, that you may be using. Version control is a way to track changes in your source code – in our case, KQL queries. Not only does version control help us keep track of all source code changes that have been made to your code, but it also helps us share code with colleagues and friends. All the code examples that accompany this book are version controlled using a popular version control tool called Git and are hosted on github.com.

At one point in their careers, I am pretty sure all developers have looked at their old code and felt like they had no idea what the code does. Since version control keeps track of all changes, we can easily search the change history to understand why certain changes have...

Summary

In this chapter, we started by introducing performance tuning and discovered that performance tuning is a process similar to troubleshooting. Next, we learned what workload groups are and how to configure them. We created an example workload group that restricted the number of requests that members from a specific AAD group can make. We discovered the three components that are required to use workload groups are the request classification policy, which is responsible for assigning requests to workload groups, the workload groups themselves, and the workload group policies that allow us to apply restrictions to requests, such as rate limiting.

Next, we discovered how to manage our cluster performance by managing the hot cache from the data plane. By managing the cache at the data plane, we can allow database administrators to tune performance, without giving them access to the management plane.

Next, we introduced the .show queries and .show commands KQL management commands...

Questions

Before moving on to the next chapter, test your knowledge by trying out these exercises. The answers can be found at the back of this book:

What is the purpose of workload groups?

Assuming that we have our request classification policy configured and enabled, what will happen when we execute the following query as a database admin?

.alter cluster policy request_classification '{"IsEnabled":false}' <|
    iff(current_principal_is_member_of('aadgroup=TrialUsers;27447925-1f0e-41b6-b01f-973eaab478b0'), "Packt Demo","default")

Why should you filter your data based on a date field as early as possible in your query?
Create a dashboard in the Data Explorer Web UI and display the query execution metrics, such as the longest top 5 running queries, and aggregate the workload groups. Hint: use .show queries and review Chapter 8, Data Visualization with Azure Data Explorer and Power BI.

...

The rest of the chapter is locked

You have been reading a chapter from

Scalable Data Analytics with Azure Data Explorer

Published in: Mar 2022Publisher: PacktISBN-13: 9781801078542

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages