Reader small image

You're reading from  Scalable Data Analytics with Azure Data Explorer

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801078542
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Jason Myerscough
Jason Myerscough
author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough

Right arrow

Chapter 9: Monitoring and Troubleshooting Azure Data Explorer

Monitoring systems and environments seems to be more of an afterthought rather than a part of initial requirements and design. During postmortems for production issues, it is not uncommon to have a long list of action items to remediate basic monitoring that we might assume is already implemented, such as Secure Sockets Layer (SSL) certificate expiration. Non-production environments are another story; it is not uncommon to find environments such as user acceptance testing (UAT), quality assurance (QA), and staging environments with no monitoring. Avoid these bad practices; always monitor your resources, regardless of the environment. How you raise an alert can differ depending on the environment and your service-level agreements (SLAs), but ensure that you always monitor your resources; otherwise, you are setting yourself up for failure.

In this chapter, we will begin by introducing the concepts of monitoring in Azure...

Technical requirements

The code examples for this chapter can be found in the Chapter09 folder of our repository at the following link: https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.git.

In our examples, we will be using the EnglishPremierLeagueJSON table we created in Chapter 4, Ingesting Data in Azure Data Explorer, and the storage account, event grid, and event hub that we deployed. If you have deleted those resources, please redeploy your infrastructure before continuing.

Introducing monitoring and troubleshooting

Before diving into monitoring and troubleshooting ADX, it is worth spending some time introducing the concepts of monitoring and troubleshooting.

There is no consensus on one definition for monitoring. Engineers have various backgrounds and different interests, and if you were to ask 10 engineers for a definition of monitoring, you would probably get 15 different answers. From my perspective, monitoring is a tool that aids with troubleshooting and allows us to measure and observe system behavior. I like to use the analogy of a compass, whereby monitoring is leading us to issues and giving insights into overall behavior, health, and performance.

Monitoring can typically be broken down into four functions: alerting, debugging, trends, and plumbing. I tend to agree with this but would like to extend on the plumbing aspects. Here's an overview of these functions:

  • Alerting: Being able to notify engineers when issues occur. There...

Monitoring ADX

During my time using the Azure platform, one of the areas I have seen make a lot of changes is monitoring and security. Almost every resource type has monitoring and security options in their properties panel. In Chapter 10, Azure Data Explorer Security, we will look at the security options, but for now, let's focus on the monitoring aspects.

Azure Service Health

Before jumping into metrics and logging, I think it is worth mentioning the Service Health blade. From my experience, not a lot of people are aware of the Service Health blade and what it offers. The Service Health blade offers a high-level overview of historical issues, current issues, planned maintenance, and security advisories. Another nice feature is that Microsoft posts its root cause analysis (RCA) reports here. The RCA reports provide a detailed description and timeline for issues, along with mitigation steps and follow-up action. The Service Health blade is one of the first things I check...

Troubleshooting ADX

As you may recall from Chapter 4, Ingesting Data in Azure Data Explorer, we set up infrastructure to ingest data from a storage account using an event grid and an event hub. Since we did not configure diagnostics at the time, the only way to check whether the ingestion succeeded was to run a query to check whether any data was available. Depending on the ingestion policy, you had to wait up to 5 minutes for the data to be ingested. Now, imagine an error occurred—how would you know? Should you refresh your browser or continuously execute a query to return the number of rows? No! That does not scale and, like me, you probably have better things to do with your time than continuously hitting Shift + Enter to execute a query.

In this section, we will intentionally introduce an error with our data ingestion process, and then we will learn how to troubleshoot such issues by looking at ADX's metrics and diagnostic logs using Log Analytics.

Note

In...

Summary

We have only scratched the surface with regard to monitoring and troubleshooting. Monitoring and Azure Monitor deserve their own book in order to do them any justice.

In this chapter, I began by introducing the concept of monitoring, discussing why monitoring is important and what SLIs, SLOs, and SLAs are. Then, I introduced the concept of troubleshooting and discussed my thought process and how I break down problems.

The rest of the book then focused on the key metrics and logs available to us for ADX and demonstrated how to enable diagnostics, and then we walked through an example and troubleshot an issue where data ingestion was not working.

Finally, we learned how to configure alerts for ingestion failures. We configured an action group that would send an email and an SMS whenever the ingestion failed.

In Chapter 10, Azure Data Explorer Security, we will learn how to secure our ADX clusters.

Questions

Before moving on to the next chapter, test your knowledge by answering the following questions. The answers can be found at the back of the book:

  1. What is the difference between SLIs, SLOs, and SLAs?
  2. Configure a metrics dashboard to display the Blobs received metric and then import some data into your ADX cluster. What do you see?
  3. Try to implement monitoring alerts for an event hub's incoming and outgoing messages and set the severity to Informational since this is typically not an error.
  4. How many severity levels are there and what does each level mean?
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scalable Data Analytics with Azure Data Explorer
Published in: Mar 2022Publisher: PacktISBN-13: 9781801078542
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough