Reader small image

You're reading from  Machine Learning for Time-Series with Python

Product typeBook
Published inOct 2021
PublisherPackt
ISBN-139781801819626
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

Machine Learning Models for Time-Series

Machine learning has come a long way in recent years, and this is reflected in the methods available to time-series predictions. We've introduced a few state-of-the-art machine learning methods for time-series in Chapter 4, Introduction to Machine Learning for Time-Series. In the current chapter, we'll introduce several more machine learning methods.

We'll go through methods that are commonly used as baseline methods, or that stand out in terms of either performance, ease of use, or their applicability. I'll introduce k-nearest neighbors with dynamic time warping and gradient boosting for time-series as a baseline and we'll go over other methods, such as Silverkite and gradient boosting. Finally, we'll go through an applied exercise with some of these methods.

We're going to cover the following topics:

  • More machine learning methods for time-series
  • K-nearest neighbors with dynamic...

More machine learning methods for time-series

The algorithms that we'll cover in this section are all highly competitive for forecasting and prediction tasks. If you are looking for a discussion of state-of-the-art machine learning algorithms, please refer to Chapter 4, Introduction to Machine Learning for Time-Series.

In the aforementioned chapter, we've briefly discussed a few of these algorithms, but we'll discuss them here in more detail and we will also introduce other algorithms that we haven't discussed before, such as Silverkite, gradient boosting, and k-nearest neighbors.

We'll dedicate a separate practice section to a library that was released in 2021, which is facebook's Kats. Kats provides many advanced features, including hyperparameter tuning and ensemble learning. On top of these features, they implement feature extraction based on the TSFresh library and include many models, including Prophet, SARIMA, and others. They...

K-nearest neighbors with dynamic time warping

K-nearest neighbors is a well-known machine learning method (sometimes also going under the guise of case-based reasoning). In kNN, we can use a distance measure to find similar data points. We can then take the known labels of these nearest neighbors as the output and integrate them in some way using a function.

Figure 7.3 illustrates the basic idea of kNN for classification (source – WikiMedia Commons: https://commons.wikimedia.org/wiki/File:KnnClassification.svg):

/Users/ben/Downloads/Machine-Learning for Time-Series with Python/knn.png

Figure 7.3: K-nearest neighbor for classification

We know a few data points already. In the preceding illustration, these points are indicated as squares and triangles, and they represent data points of two different classes, respectively. Given a new data point, indicated by a circle, we find the closest known data points to it. In this example, we find that the new point is similar to triangles, so we might assume that the new point is of the triangle...

Silverkite

The Silverkite algorithm ships together with the Greykite library released by LinkedIn. It was explicitly designed with the goals in mind of being fast, accurate, and intuitive. The algorithm is described in a 2021 publication ("A flexible forecasting model for production systems", by Reza Hosseini and others).

According to LinkedIn, it can handle different kinds of trends and seasonalities such as hourly, daily, weekly, repeated events, and holidays, and short-range effects. Within LinkedIn, it is used for both short-term, for example, a 1-day head, and long-term forecast horizons, such as 1 year ahead.

Use cases within LinkedIn include optimizing budget decisions, setting business metric targets, and providing sufficient infrastructure to handle peak traffic. Furthermore, a use case has been to model recoveries from the COVID-19 pandemic.

The time-series is modeled as an additive composite of trends, change points, and seasonality, where seasonality...

Gradient boosting

XGBoost (short for eXtreme Gradient Boosting) is an efficient implementation of gradient boosting (Jerome Friedman, "Greedy function approximation: a gradient boosting machine", 2001) for classification and regression problems. Gradient boosting is also known as Gradient Boosting Machine (GBM) or Gradient Boosted Regression Tree (GBRT). A special case is LambdaMART for ranking applications. Apart from XGBoost; other implementations are Microsoft's Light Gradient Boosting Machine (LightGBM), and Yandex's Catboost.

Gradient Boosted Trees is an ensemble of trees. This is similar to Bagging algorithms such as Random Forest; however, since this is a boosting algorithm, each tree is computed to incrementally reduce the error. With each new iteration a tree is greedily chosen and its prediction is added to the previous predictions based on a weight term. There is also a regularization term that penalizes complexity and reduces overfitting, similar...

Python exercise

Let's put into practice what we've learned in this chapter so far.

As for requirements, in this chapter, we'll be installing requirements for each section separately. The installation can be performed from the terminal, the notebook, or from the anaconda navigator.

In a few of the following sections, we'll demonstrate classification in a forecast, so some of these approaches will not be comparable. The reader is invited to do forecasts and classification using each approach and then compare results.

As a note of caution, both Kats and Greykite (at the time of writing) are very new libraries, so there might still be frequent changes to dependencies. They might pin your NumPy version or other commonly used libraries. Therefore, I'd recommend you install them in virtual environments separately for each section.

We'll go through this setup in the next section.

Virtual environments

In a Python virtual environment...

Summary

In this chapter, we've discussed popular time-series machine learning libraries in Python. We then discussed and tried out a k-nearest neighbor algorithm with dynamic time warping for the classification of robotic failures. We talked about validation in time-series forecasting and we tried three different methods for forecasting COVID cases: Silverkite, Gradient Boosting with XGBoost, and ensemble models in Kats.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning for Time-Series with Python
Published in: Oct 2021Publisher: PacktISBN-13: 9781801819626
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at £13.99/month. Cancel anytime

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth