Reader small image

You're reading from  Machine Learning for Time-Series with Python

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801819626
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

What is time-series analysis?

The term time-series analysis (TSA) refers to the statistical approach to time-series or the analysis of trend and seasonality. It is often an ad hoc exploration and analysis that usually involves visualizing distributions, trends, cyclic patterns, and relationships between features, and between features and the target(s).

More generally, we can say TSA is roughly exploratory data analysis (EDA) that's specific to time-series data. This comparison can be misleading however since TSA can include both descriptive and exploratory elements.

Let's see quickly the differences between descriptive and exploratory analysis:

  • Descriptive analysis summarizes characteristics of a dataset
  • Exploratory analysis analyzes for patterns, trends, or relationships between variables

Therefore, TSA is the initial investigation of a dataset with the goal of discovering patterns, especially trend and seasonality, and obtaining initial insights, testing hypotheses, and extracting meaningful summary statistics.

Definition: Time-Series Analysis (TSA) is the process of extracting a summary and other statistical information from time-series, most importantly, the analysis of trend and seasonality.

Since an important part of TSA is gathering statistics and representing your dataset graphically through visualization, we'll do a lot of plots in this chapter. Many statistics and plots described in this chapter are specific to TSA, so even if you are familiar with EDA, you'll find something new.

A part of TSA is collecting and reviewing data, examining the distribution of variables (and variable types), and checking for errors, outliers, and missing values. Some errors, variable types, and anomalies can be corrected, therefore EDA is often performed hand in hand with preprocessing and feature engineering, where columns and fields are selected and transformed. The whole process from data loading to machine learning is highly iterative and may involve multiple instances of TSA at different points.

Here are a few crucial steps for working with time-series:

  • Importing the dataset
  • Data cleaning
  • Understanding variables
  • Uncovering relationships between variables
  • Identifying trend and seasonality
  • Preprocessing (including feature engineering)
  • Training a machine learning model

Importing the data can be considered prior to TSA, and data cleaning, feature engineering, and training a machine learning model are not strictly part of TSA.

Importing the data includes parsing, for example extracting dates. The three steps that are central to TSA are understanding variables, uncovering relationships between variables, and identifying trend and seasonality. There's a lot more to say about each of them, and in this chapter, we'll talk about them in more detail in their dedicated sections.

The steps belonging to TSA and leading to preprocessing (feature engineering) and machine learning are highly iterative, and can be visually appreciated in the following time-series machine learning flywheel:

Flywheel%20-%20page%201.png

Figure 2.1: The time-series machine learning flywheel

This flywheel emphasizes the iterative nature of the work. For example, data cleaning comes often after loading the data, but will come up again after we've made another discovery about our variables. I've highlighted TSA in dark, while steps that are not strictly part of TSA are grayed out.

Let's go through something practical! We'll start by loading a dataset. Right after importing the data, we'd ask questions like what's the size of the dataset (the number of observations)? How many features or columns do we have? What are the column types?

We'll typically look at histograms or distribution plots. For assessing relationships between features and target variables, we'd calculate correlations and visualize them as a correlation heatmap, where the correlation strength between variables is mapped to colors.

We'd look for missing values – in a spreadsheet, these would be empty cells – and we'd clean up and correct these irregularities, where possible.

We are going to be analyzing relationships between variables, and in TSA, one of its peculiarities is that we need to investigate the relationship of time with each variable.

Generally, a useful way of distinguishing different types of techniques could be between univariate and multivariate analysis, and between graphical and non-graphical techniques. Univariate analysis means we are looking at a single variable. This means we could be inspecting values to get the means and the variance, or – for the graphical side – plotting the distribution. We summarize these techniques in the Understanding the variables section.

On the other hand, multivariate analysis means we are calculating correlations between variables, or – for the graphical side – drawing a scatter plot, for example. We'll delve into these techniques in the Uncovering relationships between variables section.

Before we continue, let's go through a bit of the basics of time-series with Python. This will cover the basic operations with time-series data as an introduction. After this, we'll go through Python commands with an actual dataset.

Previous PageNext Page
You have been reading a chapter from
Machine Learning for Time-Series with Python
Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth