Reader small image

You're reading from  Scalable Data Analytics with Azure Data Explorer

Product typeBook
Published inMar 2022
Reading LevelBeginner
PublisherPackt
ISBN-139781801078542
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Jason Myerscough
Jason Myerscough
author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough

Right arrow

Technical requirements

The code examples for this chapter can be found in the Chapter07 folder of this book's GitHub repository: https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.

In our examples, we will be using the demo tables that are available on the Microsoft help cluster (https://help.kusto.windows.net/) and the Log Analytics playground (https://aka.ms/lademo/), which is also provided by Microsoft.

Calculating moving averages with KQL

There may be instances where your time series data is clean and all the components such as seasonality, trends, and variations are visible to the point that you can confidently make decisions based on the data without having to manipulate or clean the data. In reality, there will be noise and variations in the data that may obfuscate patterns and anomalies. KQL provides a rich set of functions for analyzing time series data. One subset of those functions is for calculating moving averages. Moving averages allow us to remove noise and smoothen our data.

The goal of this section is to learn how to use series_fir() to calculate moving averages to smoothen our data. Finite Impulse Response (FIR) is a filtering technique that is commonly used in signal processing and time series.

As you may recall, in Chapter 6, Introducing Time Series Analysis, we used demo_make_series1, which is a table in the help cluster (https://help.kusto.windows.net) to...

Trend analysis with KQL

As we discussed in Chapter 6, Introducing Time Series Analysis, one of the components of time series data is a trend. A trend helps visualize and predict the long-term direction of data. The trend is either positive, also known as an upward trend, or negative, also known as a downward trend. KQL provides two functions, series_fit_line() and series_fit_2lines(), for calculating the trend. We will begin by looking at series_fit_line() before looking at series_fit_2lines().

Applying linear regression with KQL

The series_fit_line() function performs linear regression to calculate the best fit line, also known as the regression line, for our original time series. Once we have calculated our regression line, we can identify the positive and negative relationships between our x-axis, also known as the independent variable, and our y-axis, also known as the dependent variable. The series_fit_line() function takes one argument, which is a time series, and returns...

Anomaly detection and forecasting with KQL

By now, you should have a good understanding of the different components of time series such as seasonality, trends, and variations. KQL provides the series_decompose() function to calculate the values of these components for a given time series.

The series_decompose() function expects one required argument and four optional arguments. Let's look at these arguments in more detail:

  • series is the time series we would like to calculate the components for.
  • seasonality is set to -1 to have the function autodetect the seasonality, 0 to skip the seasonality analysis, or a positive integer to specify the expected period. The default value is -1 (auto-detect).
  • trend determines the type of trend analysis that's performed. There are three options we can specify at the time of writing:
    • avg specifies the average bins for the trend.
    • linefit specifies linear regression, which we learned about earlier, by using series_fit_line...

Summary

This chapter introduced the basics of time series analysis. For a deeper dive into time series analysis and statistics, I highly recommend looking at some of the great titles published by Packt, such as Practical Time Series Analysis and Forecasting Time Series Data with Facebook Prophet.

In this chapter, we learned about moving averages and how moving averages can help reduce noise and make our time series data smoother. Reducing noise helps us identify the patterns and common traits of time series data, such as variations and seasonality. Furthermore, reducing the noise helps improve our accuracy when making forecasts.

Next, we learned how to render moving averages and line regressions in Log Analytics. Log Analytics requires a couple of extra steps to be performed in the query before the data is rendered to the charts due to the Data Explorer Web UI and Log Analytics having different user agents. Please see https://docs.microsoft.com/en-us/azure/data-explorer/kusto...

Questions

Before moving on to the next chapter, test your knowledge by answering these questions. The answers can be found at the back of this book:

  1. What is the purpose of moving averages?
  2. What is the purpose of linear regression?
  3. What are the extra steps required to render time charts in log analytics?
  4. In Figure 7.12, we rendered an anomaly chart to display the anomalies in the time series. Using series_fir(), generate a smoother graph without the anomalies. Once you have generated a smoother output, pass your data to series_decompose_anomalies() to see if there are still any anomalies. The query for generating the graph in Figure 7.12 is as follows. You will need to connect to the help cluster (https://help.kusto.windows.net/) to complete this exercise:
    let startTime = toscalar(demo_make_series1 | summarize min(TimeStamp));
    let endTime = toscalar(demo_make_series1 | summarize max(TimeStamp));
    let binSize = 1h;
    demo_make_series1
    | make-series requests=count() default...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Scalable Data Analytics with Azure Data Explorer
Published in: Mar 2022Publisher: PacktISBN-13: 9781801078542
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough