You're reading from Scalable Data Analytics with Azure Data Explorer

Product typeBook

Published inMar 2022

Reading LevelBeginner

PublisherPackt

ISBN-139781801078542

Edition1st Edition

Languages

Python

Concepts

Big Data

Author (1)

Jason Myerscough

Chapter 7: Identifying Patterns, Anomalies, and Trends in your Data

In the previous chapter, we introduced the concept and properties of time series and demonstrated how to create time series and render them as time charts in Kusto Query Language (KQL). Now that we are familiar with time series and their properties such as seasonality, variations, and trends, the next step is to learn how to identify these patterns and properties in our data.

The goal of the chapter is to remain as practical as possible and focus on learning how and when to use KQL's functions and operators, which allow us to analyze our data, identify trends and anomalies, and make forecasts so that we can gain better insights into our data.

In this chapter, we will begin by learning about moving averages and how moving averages can help reduce noise and smoothen our time series data. Next, we will learn how to perform trend analysis in KQL by using linear regression.

Finally, we will learn how to determine...

Technical requirements

The code examples for this chapter can be found in the Chapter07 folder of this book's GitHub repository: https://github.com/PacktPublishing/Scalable-Data-Analytics-with-Azure-Data-Explorer.

In our examples, we will be using the demo tables that are available on the Microsoft help cluster (https://help.kusto.windows.net/) and the Log Analytics playground (https://aka.ms/lademo/), which is also provided by Microsoft.

Calculating moving averages with KQL

There may be instances where your time series data is clean and all the components such as seasonality, trends, and variations are visible to the point that you can confidently make decisions based on the data without having to manipulate or clean the data. In reality, there will be noise and variations in the data that may obfuscate patterns and anomalies. KQL provides a rich set of functions for analyzing time series data. One subset of those functions is for calculating moving averages. Moving averages allow us to remove noise and smoothen our data.

The goal of this section is to learn how to use series_fir() to calculate moving averages to smoothen our data. Finite Impulse Response (FIR) is a filtering technique that is commonly used in signal processing and time series.

As you may recall, in Chapter 6, Introducing Time Series Analysis, we used demo_make_series1, which is a table in the help cluster (https://help.kusto.windows.net) to...

Trend analysis with KQL

As we discussed in Chapter 6, Introducing Time Series Analysis, one of the components of time series data is a trend. A trend helps visualize and predict the long-term direction of data. The trend is either positive, also known as an upward trend, or negative, also known as a downward trend. KQL provides two functions, series_fit_line() and series_fit_2lines(), for calculating the trend. We will begin by looking at series_fit_line() before looking at series_fit_2lines().

Applying linear regression with KQL

The series_fit_line() function performs linear regression to calculate the best fit line, also known as the regression line, for our original time series. Once we have calculated our regression line, we can identify the positive and negative relationships between our x-axis, also known as the independent variable, and our y-axis, also known as the dependent variable. The series_fit_line() function takes one argument, which is a time series, and returns...

Anomaly detection and forecasting with KQL

By now, you should have a good understanding of the different components of time series such as seasonality, trends, and variations. KQL provides the series_decompose() function to calculate the values of these components for a given time series.

The series_decompose() function expects one required argument and four optional arguments. Let's look at these arguments in more detail:

series is the time series we would like to calculate the components for.
seasonality is set to -1 to have the function autodetect the seasonality, 0 to skip the seasonality analysis, or a positive integer to specify the expected period. The default value is -1 (auto-detect).
trend determines the type of trend analysis that's performed. There are three options we can specify at the time of writing:
- avg specifies the average bins for the trend.
- linefit specifies linear regression, which we learned about earlier, by using series_fit_line...

Summary

This chapter introduced the basics of time series analysis. For a deeper dive into time series analysis and statistics, I highly recommend looking at some of the great titles published by Packt, such as Practical Time Series Analysis and Forecasting Time Series Data with Facebook Prophet.

In this chapter, we learned about moving averages and how moving averages can help reduce noise and make our time series data smoother. Reducing noise helps us identify the patterns and common traits of time series data, such as variations and seasonality. Furthermore, reducing the noise helps improve our accuracy when making forecasts.

Next, we learned how to render moving averages and line regressions in Log Analytics. Log Analytics requires a couple of extra steps to be performed in the query before the data is rendered to the charts due to the Data Explorer Web UI and Log Analytics having different user agents. Please see https://docs.microsoft.com/en-us/azure/data-explorer/kusto...

Questions

Before moving on to the next chapter, test your knowledge by answering these questions. The answers can be found at the back of this book:

What is the purpose of moving averages?
What is the purpose of linear regression?
What are the extra steps required to render time charts in log analytics?
In Figure 7.12, we rendered an anomaly chart to display the anomalies in the time series. Using series_fir(), generate a smoother graph without the anomalies. Once you have generated a smoother output, pass your data to series_decompose_anomalies() to see if there are still any anomalies. The query for generating the graph in Figure 7.12 is as follows. You will need to connect to the help cluster (https://help.kusto.windows.net/) to complete this exercise:
```
let startTime = toscalar(demo_make_series1 | summarize min(TimeStamp));
let endTime = toscalar(demo_make_series1 | summarize max(TimeStamp));
let binSize = 1h;
demo_make_series1
| make-series requests=count() default...
```

The rest of the chapter is locked

You have been reading a chapter from

Scalable Data Analytics with Azure Data Explorer

Published in: Mar 2022Publisher: PacktISBN-13: 9781801078542

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Jason Myerscough

Jason Myerscough is a director of Site Reliability Engineering and cloud architect at Nuance Communications. He has been working with Azure daily since 2015. He has migrated his company's flagship product to Azure and designed the environment to be secure and scalable across 16 different Azure regions by applying cloud best practices and governance. He is currently certified as an Azure Administrator (AZ-103) and an Azure DevOps Expert (AZ-400). He holds a first-class bachelor's degree with honors in software engineering and a first class master’s degree in computing.
Read more about Jason Myerscough

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages