You're reading from Machine Learning for Time-Series with Python

Product typeBook

Published inOct 2021

PublisherPackt

ISBN-139781801819626

Edition1st Edition

Concepts

Machine Learning

Author (1)

Ben Auffarth

What is time-series analysis?

The term time-series analysis (TSA) refers to the statistical approach to time-series or the analysis of trend and seasonality. It is often an ad hoc exploration and analysis that usually involves visualizing distributions, trends, cyclic patterns, and relationships between features, and between features and the target(s).

More generally, we can say TSA is roughly exploratory data analysis (EDA) that's specific to time-series data. This comparison can be misleading however since TSA can include both descriptive and exploratory elements.

Let's see quickly the differences between descriptive and exploratory analysis:

Descriptive analysis summarizes characteristics of a dataset
Exploratory analysis analyzes for patterns, trends, or relationships between variables

Therefore, TSA is the initial investigation of a dataset with the goal of discovering patterns, especially trend and seasonality, and obtaining initial insights, testing hypotheses, and extracting meaningful summary statistics.

Definition: Time-Series Analysis (TSA) is the process of extracting a summary and other statistical information from time-series, most importantly, the analysis of trend and seasonality.

Since an important part of TSA is gathering statistics and representing your dataset graphically through visualization, we'll do a lot of plots in this chapter. Many statistics and plots described in this chapter are specific to TSA, so even if you are familiar with EDA, you'll find something new.

A part of TSA is collecting and reviewing data, examining the distribution of variables (and variable types), and checking for errors, outliers, and missing values. Some errors, variable types, and anomalies can be corrected, therefore EDA is often performed hand in hand with preprocessing and feature engineering, where columns and fields are selected and transformed. The whole process from data loading to machine learning is highly iterative and may involve multiple instances of TSA at different points.

Here are a few crucial steps for working with time-series:

Importing the dataset
Data cleaning
Understanding variables
Uncovering relationships between variables
Identifying trend and seasonality
Preprocessing (including feature engineering)
Training a machine learning model

Importing the data can be considered prior to TSA, and data cleaning, feature engineering, and training a machine learning model are not strictly part of TSA.

Importing the data includes parsing, for example extracting dates. The three steps that are central to TSA are understanding variables, uncovering relationships between variables, and identifying trend and seasonality. There's a lot more to say about each of them, and in this chapter, we'll talk about them in more detail in their dedicated sections.

The steps belonging to TSA and leading to preprocessing (feature engineering) and machine learning are highly iterative, and can be visually appreciated in the following time-series machine learning flywheel:

Figure 2.1: The time-series machine learning flywheel

This flywheel emphasizes the iterative nature of the work. For example, data cleaning comes often after loading the data, but will come up again after we've made another discovery about our variables. I've highlighted TSA in dark, while steps that are not strictly part of TSA are grayed out.

Let's go through something practical! We'll start by loading a dataset. Right after importing the data, we'd ask questions like what's the size of the dataset (the number of observations)? How many features or columns do we have? What are the column types?

We'll typically look at histograms or distribution plots. For assessing relationships between features and target variables, we'd calculate correlations and visualize them as a correlation heatmap, where the correlation strength between variables is mapped to colors.

We'd look for missing values – in a spreadsheet, these would be empty cells – and we'd clean up and correct these irregularities, where possible.

We are going to be analyzing relationships between variables, and in TSA, one of its peculiarities is that we need to investigate the relationship of time with each variable.

Generally, a useful way of distinguishing different types of techniques could be between univariate and multivariate analysis, and between graphical and non-graphical techniques. Univariate analysis means we are looking at a single variable. This means we could be inspecting values to get the means and the variance, or – for the graphical side – plotting the distribution. We summarize these techniques in the Understanding the variables section.

On the other hand, multivariate analysis means we are calculating correlations between variables, or – for the graphical side – drawing a scatter plot, for example. We'll delve into these techniques in the Uncovering relationships between variables section.

Before we continue, let's go through a bit of the basics of time-series with Python. This will cover the basic operations with time-series data as an introduction. After this, we'll go through Python commands with an actual dataset.

You have been reading a chapter from

Machine Learning for Time-Series with Python

Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages