You're reading from Modern Time Series Forecasting with Python

Product type Book

Published in Nov 2022

Publisher Packt

ISBN-13 9781803246802

Pages 552 pages

Edition 1st Edition

Languages

Concepts

Data Science

Author (1):

Manu Joseph

Table of Contents (26) Chapters

Preface

1. Part 1 – Getting Familiar with Time Series

2. Chapter 1: Introducing Time Series

3. Chapter 2: Acquiring and Processing Time Series Data

4. Chapter 3: Analyzing and Visualizing Time Series Data

5. Chapter 4: Setting a Strong Baseline Forecast

6. Part 2 – Machine Learning for Time Series

7. Chapter 5: Time Series Forecasting as Regression

8. Chapter 6: Feature Engineering for Time Series Forecasting

9. Chapter 7: Target Transformations for Time Series Forecasting

10. Chapter 8: Forecasting Time Series with Machine Learning Models

11. Chapter 9: Ensembling and Stacking

12. Chapter 10: Global Forecasting Models

13. Part 3 – Deep Learning for Time Series

14. Chapter 11: Introduction to Deep Learning

15. Chapter 12: Building Blocks of Deep Learning for Time Series

16. Chapter 13: Common Modeling Patterns for Time Series

17. Chapter 14: Attention and Transformers for Time Series

18. Chapter 15: Strategies for Global Deep Learning Forecasting Models

19. Chapter 16: Specialized Deep Learning Architectures for Forecasting

20. Part 4 – Mechanics of Forecasting

21. Chapter 17: Multi-Step Forecasting

22. Chapter 18: Evaluating Forecasts – Forecast Metrics

23. Chapter 19: Evaluating Forecasts – Validation Strategies

24. Index

Why subscribe?

25. Other Books You May Enjoy

Strategies for Global Deep Learning Forecasting Models

All through the last few chapters, we have been building up deep learning for time series forecasting. We started with the basics of deep learning, saw the different building blocks, practically used some of those building blocks to generate forecasts on a sample household, and finally, talked about attention and transformers. Now, let’s slightly alter our trajectory and take a look at global models for deep learning. In Chapter 10, Global Forecasting Models, we saw why global models make sense and also saw how we can use such a model in the machine learning context. We even got good results in our experiments. In this chapter, we will look at how we can apply similar concepts, but from a deep learning context. We will look at different strategies that we can use to make global deep learning models work better.

In this chapter, we will be covering these main topics:

Creating global deep learning forecasting models...

Technical requirements

You will need to set up the Anaconda environment following the instructions in the Preface of the book to get a working environment with all the packages and datasets required for the code in this book.

You will need to run these notebooks:

02 - Preprocessing London Smart Meter Dataset.ipynb in Chapter02
01-Setting up Experiment Harness.ipynb in Chapter04
01-Feature Engineering.ipynb in Chapter06

The associated code for the chapter can be found at https://github.com/PacktPublishing/Modern-Time-Series-Forecasting-with-Python-/tree/main/notebooks/Chapter15.

Creating global deep learning forecasting models

In Chapter 10, Global Forecasting Models, we talked in detail about why a global model makes sense. We talked about the benefits regarding increased sample size, cross-learning, multi-task learning and the regularization effect that comes with it, and reduced engineering complexity. All of these are relevant for a deep learning model as well. Engineering complexity and sample size become even more important because deep learning models are data-hungry and take quite a bit more engineering effort and training time than other machine learning models. I would go to the extent to say that in the deep learning context, in most practical cases where we have to forecast at scale, global models are the only deep learning paradigm that makes sense.

So, why did we spend all that time looking at individual models? Well, it’s easier to grasp the concept at that level, and the skills and knowledge we gained at that level are very easily...

Using time-varying information

The GFM(ML) used all the available features. So obviously, that model had access to a lot more information than the GFM(DL) we have built till now. The GFM(DL) we just built only takes in the history and nothing else. Let’s change that by including time-varying information. We will just use time-varying real features this time because dealing with categorical features is a topic I want to leave for the next section.

We initialize the training dataset the same way as before, but we add time_varying_known_reals=feat_config.time_varying_known_reals to the initialization parameters. Now that we have all the datasets created, let’s move on to setting up the model.

To set up the model, we need to understand one concept. We are now using the history of the target and time-varying known features. In Figure 15.3, we saw how TimeSeriesDataset arranges the different kinds of variables in PyTorch tensors. In the previous section, we used only...

Using static/meta information

There are some features such as the Acorn group, whether dynamic pricing is enabled, and so on, that are specific to a household, which will help the model learn patterns specific to these groups. Naturally, including that information makes intuitive sense. But as we discussed in Chapter 10, Global Forecasting Models, categorical features do not play well with machine learning models because they aren’t numerical. In that chapter, we discussed a few ways of encoding categorical features into numerical representations. We can use any of those in a deep learning model as well. But there is one way of handling categorical features that is unique to deep learning models – embedding vectors.

One-hot encoding and why it is not ideal

One of the ways of converting categorical features to numerical representation is one-hot encoding. It encodes the categorical features in a higher dimension, placing the categorical values equally distant in...

Using the scale of the time series

We used GroupNormlizer in TimeSeriesDataset to scale each household using its own mean and standard deviation. We did this because we wanted to make the target zero mean and unit variance so that the model does not waste effort trying to change its parameters to capture the scale of individual household consumption. Although this is a good strategy, we do have some information loss here. There may be patterns that are specific to households whose consumption is on the larger side and some other patterns that are specific to households that consume much less. But now, they are both lumped in together and the model tries to learn common patterns. In such a scenario, these unique patterns seem like noise to the model because there is no variable to explain those.

The bottom line is that there is information in the scale that we removed, and adding that information back would be beneficial. So, how do we add it back? Definitely not by including the...

Balancing the sampling procedure

We saw a few strategies for improving a global deep learning model by adding new types of features. Now, let’s look at a different aspect that is relevant in a global modeling context. In an earlier section, when we were talking about global deep learning models, we talked about how the process by which we sample a window of sequence to feed to our model can be thought of as a two-step process:

Sampling a time series out of a set of time series
Sampling a window out of that time series

Let’s use an analogy to make the concept clearer. Imagine we have a large bowl that we have filled with balls. Each ball in the bowl represents a time series in the dataset (a household in our dataset). Now, each ball, , has chits of paper representing all the different windows of samples we can draw from it.

In the batch sampling we use by default, we open all the balls and dump all the chits into the bowl and discard the balls....

Summary

After having built a strong foundation on deep learning models in the last few chapters, we started to look at a new paradigm of global models in the context of deep learning models. We learned how to use PyTorch Forecasting, an open source library for forecasting using deep learning, and used the feature-filled TimeSeriesDataset to start developing our own models.

We started off with a very simple LSTM in the global context and saw how we can add time-varying information, static information, and the scale of individual time series to the features to make models better. We closed by looking at an alternating sampling procedure for mini-batches that helps us present a more balanced view of the problem in each batch. This chapter is by no means an exhaustive list of all such techniques to make the forecasting models better. Instead, this chapter aims to build the right kind of thinking that is necessary to work on your own models and make them work better than before.

And...