Reader small image

You're reading from  Scalable Data Analytics with Azure Data Explorer

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801078542
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Jason Myerscough
Jason Myerscough
author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough

Right arrow

Chapter 11: Performance Tuning in Azure Data Explorer

Azure Data Explorer (ADX) is designed for high performance without the need for performance maintenance activities. However, it can still experience slow performance when overwhelmed by the workload. Therefore, it is important to understand performance tuning to ensure we maintain the high performance we know ADX delivers. In the examples we have seen so far, we have not had to worry about performance. Our datasets have been relatively small and even with the larger datasets we used on the help cluster, performance has not been an issue. With that said, as you make your cluster available to end users so that they can run queries and generate reports, their usage patterns and the queries they write can collectively impact performance.

In this chapter, we will begin by introducing performance tuning. Then, we will introduce workload groups, learn how they work, and how they can help preserve cluster performance. We will also create...

Technical requirements

The code examples for this chapter can be found in the Chapter11 folder of this book's GitHub repository: https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.git.

Introducing performance tuning

Before we jump into workload groups, let's spend a few moments thinking about performance tuning in general. In general, performance should not be an issue, given that ADX has been designed and optimized to be a big data service that is highly scalable and fast. As you ingest more and more data and allow more users and applications to query your clusters, you may experience some performance degradation. Therefore, it is important to beware of performance tuning concepts and what features ADX provides to help tune performance when the time comes.

Like troubleshooting, which we discussed in Chapter 9, Monitoring and Troubleshooting Azure Data Explorer, performance tuning can be considered as a process. The goal of performance tuning is to identify bottlenecks, troubleshoot their causes, and apply the features that are available to us, such as workload groups, cache policies, and so on, to eliminate bottlenecks. It is also important to understand...

Introducing workload groups

I remember working on a big data project where we had a wide range of end users and applications using our clusters. At one end of the spectrum, we had engineers executing ad hoc queries to analyze application logs, while at the other end, we had product management and customer support teams running complex reports by using integrations into third-party tools, such as Power BI, to gain insights into usage patterns and statistics. At the end of each month, the team would start to receive phone calls and tickets related to query and job performance. Users were complaining that their jobs were either not running or timing out. It turned out that the customer support team was running jobs and reports to generate billing information and that these jobs were resource-intensive and would consume all the resources, causing other jobs to be queued or time out. The only way to resolve the issue was to log into the cluster and kill the long-running tasks.

Managing...

Introducing policy management

ADX supports performance tuning via policies. These policies allow us to configure individual components of our cluster such as caching, ingestion, and retention. As you may recall from Chapter 2, Building Your Azure Data Explorer Environment, a lot of these settings can be set at the management plane level. The great benefit of policies is that we can configure these individual components without having to authorize contributor access at the management plane level. Administrators of the ADX databases, at the data plane level, can configure these policies.

In this section, we will demonstrate how to configure caching and retention policies using KQL management commands. For a complete list of policies that can be configured, please see the ADX documentation: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/. The general syntax for managing policies is the same for all policies, so once you know how to configure one, configuring...

Monitoring queries

At the beginning of this chapter, in the Introducing performance tuning section, we learned that performance tuning is a process and that one of the steps is to identify the root cause of performance bottlenecks. We also saw in Chapter 9, Monitoring and Troubleshooting Azure Data Explorer, that the ADX Insights page in the Azure Portal provides out-of-the-box dashboards for performance and other telemetry, which is are very useful.

We can use KQL management commands to view all the commands and queries that have been executed by our ADX cluster. These KQL management commands provide valuable insights into CPU consumption, query execution duration, the query/command being executed, and the state of the query's execution. The following query returns all the queries, sorted by execution duration, which have been executed on our ADX cluster:

.show queries 
| project Text, Database, StartedOn, Duration, State, FailureReason, TotalCpu, CacheStatistics.Disk.Misses...

KQL best practices

In this section, we will discuss some of the best practices that you should take into consideration when developing your KQL queries.

Version controlling your queries

The first recommendation is to version control all your queries, any scripts, and so on, that you may be using. Version control is a way to track changes in your source code – in our case, KQL queries. Not only does version control help us keep track of all source code changes that have been made to your code, but it also helps us share code with colleagues and friends. All the code examples that accompany this book are version controlled using a popular version control tool called Git and are hosted on github.com.

At one point in their careers, I am pretty sure all developers have looked at their old code and felt like they had no idea what the code does. Since version control keeps track of all changes, we can easily search the change history to understand why certain changes have...

Summary

In this chapter, we started by introducing performance tuning and discovered that performance tuning is a process similar to troubleshooting. Next, we learned what workload groups are and how to configure them. We created an example workload group that restricted the number of requests that members from a specific AAD group can make. We discovered the three components that are required to use workload groups are the request classification policy, which is responsible for assigning requests to workload groups, the workload groups themselves, and the workload group policies that allow us to apply restrictions to requests, such as rate limiting.

Next, we discovered how to manage our cluster performance by managing the hot cache from the data plane. By managing the cache at the data plane, we can allow database administrators to tune performance, without giving them access to the management plane.

Next, we introduced the .show queries and .show commands KQL management commands...

Questions

Before moving on to the next chapter, test your knowledge by trying out these exercises. The answers can be found at the back of this book:

  1. What is the purpose of workload groups?
  2. Assuming that we have our request classification policy configured and enabled, what will happen when we execute the following query as a database admin?
    .alter cluster policy request_classification '{"IsEnabled":false}' <|
        iff(current_principal_is_member_of('aadgroup=TrialUsers;27447925-1f0e-41b6-b01f-973eaab478b0'), "Packt Demo","default")
  3. Why should you filter your data based on a date field as early as possible in your query?
  4. Create a dashboard in the Data Explorer Web UI and display the query execution metrics, such as the longest top 5 running queries, and aggregate the workload groups. Hint: use .show queries and review Chapter 8, Data Visualization with Azure Data Explorer and Power BI.
...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scalable Data Analytics with Azure Data Explorer
Published in: Mar 2022Publisher: PacktISBN-13: 9781801078542
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough