Reader small image

You're reading from  Modern Time Series Forecasting with Python

Product typeBook
Published inNov 2022
PublisherPackt
ISBN-139781803246802
Edition1st Edition
Concepts
Right arrow
Author (1)
Manu Joseph
Manu Joseph
author image
Manu Joseph

Manu Joseph is a self-made data scientist with more than a decade of experience working with many Fortune 500 companies enabling digital and AI transformations, specifically in machine learning-based demand forecasting. He is considered an expert, thought leader, and strong voice in the world of time series forecasting. Currently, Manu leads applied research at Thoucentric, where he advances research by bringing cutting-edge AI technologies to the industry. He is also an active open-source contributor and developed an open-source library—PyTorch Tabular—which makes deep learning for tabular data easy and accessible. Originally from Thiruvananthapuram, India, Manu currently resides in Bengaluru, India, with his wife and son
Read more about Manu Joseph

Right arrow

Preface

Mankind has always sought the ability to predict the future. Since the earliest civilizations, people have tried to predict the future. Shamans, oracles, and prophets used anything ranging from astrology and palmistry to numerology to satisfy the human need to see into the future. In the last century, with the developments in IT, the mantle of predicting the future landed on data analysts and data scientists. And how do we predict the future? It’s not by examining the lines and creases on our hands or the positions of the stars anymore but by using data that has been generated in the past. And instead of prophecies, we now have forecasts.

Time, being the fourth dimension in our world, makes all the data generated in the world time series data. All the data that is generated in the real world has an element of time associated with it. Whether the temporal aspect is relevant to the problem or not is another question altogether. However, to be more concrete and immediate, we can find time series forecasting use cases in many industries, such as retail, energy, healthcare, and finance. We might want to know how many units of a particular product are to be dispatched to a particular store, or we might want to know how much electricity is to be produced to meet demand.

In this book, using a real-world dataset, you will learn how to handle and visualize time series data using pandas and plotly, generate baseline forecasts using darts, and use machine learning and deep learning for forecasting, using popular Python libraries such as scikit-learn and PyTorch. We conclude the book with a few chapters that cover seldom-touched aspects, such as multi-step forecasting, forecast metrics and cross validation for time series.

The book will enable you to build real-world time series forecasting systems that scale to millions of time series by mastering and applying modern concepts in machine learning and deep learning.

Who this book is for

The book is ideal for data scientists, data analysts, machine learning engineers and Python developers who want to build industry-ready time series models. Since the book explains most concepts from the ground up, basic proficiency in Python is all you need. A prior understanding of machine learning or forecasting would help speed up the learning. For seasoned practitioners in machine learning and forecasting, the book has a lot to offer in terms of advanced techniques and traversing the latest research frontiers in time series forecasting.

What this book covers

Chapter 1, Introducing Time Series, is all about introducing you to the world of time series. We lay down a definition of time series and talk about how it is related to a Data Generating Process (DGP). We will also talk about the limits of forecasting and talk about what we cannot forecast, and then we finish off the chapter by laying down some terminology that will help you understand the rest of the book.

Chapter 2, Acquiring and Processing Time Series Data, covers how you can process time series data. You will understand how different forms of time series data can be represented in a tabular form. You will learn different date-time-related functionalities in pandas and learn how to fill in missing data using techniques suited for time series. Finally, using a real-world dataset, you will go through a step-by-step journey in processing time series data using pandas.

Chapter 3, Analyzing and Visualizing Time Series Data, furthers your introduction to time series by learning how to visualize and analyze time series. You will learn different visualizations that are commonly used for time series data and then learn how to go one level deeper by decomposing time series into its components. To wrap it up, you will also look at ways to identify and treat outliers in time series data.

Chapter 4, Setting a Strong Baseline Forecast, gets right to the topic of time series forecasting as we use tried and tested methods from econometrics, such as ARIMA and exponential smoothing, to generate strong baselines. These efficient forecasting methods will provide strong baselines so that we can go beyond these classical techniques and learn modern techniques, such as machine learning. You will also get an introduction to another key topic – assessing forecastability using techniques such as spectral entropy and coefficient of variation.

Chapter 5, Time Series Forecasting as Regression, starts our journey into using machine learning for forecasting. A short introduction to machine learning lays down the foundations of what is to come in the next chapters. You will also understand, conceptually, how we can cast a time series problem as a regression problem so that we can use machine learning for it. To close off the chapter, we tease you with the possibility of global forecasting models.

Chapter 6, Feature Engineering for Time Series Forecasting, shifts gear into a more practical lesson. Using a real-world dataset, you will learn about different feature engineering techniques, such as lag features, rolling features, and Fourier terms, which help us formulate a time series problem as a regression problem.

Chapter 7, Target Transformations for Time Series Forecasting, continues the practice of exploring different target transformations to accommodate non-stationarity in time series. You will learn techniques such as the augmented Dickey–Fuller test and Mann–Kendall test to identify and treat non-stationarity.

Chapter 8, Forecasting Time Series with Machine Learning Models, continues from where the last chapter left off to start training machine learning models on the dataset we have been working on. Using the standard code framework present in the book, you will train models such as linear regression, random forest, and gradient-boosted decision trees on our dataset.

Chapter 9, Ensembling and Stacking, takes a step back and explores how we can use multiple forecasts and combine them to create a better forecast. You will explore popular techniques such as best fit, different versions of the hill-climbing algorithm, simulated annealing, and stacking to combine the different forecasts we have generated to get a better one.

Chapter 10, Global Forecasting Models, concludes your guided journey into machine learning-enabled forecasting to an exciting and new paradigm – global forecasting models. You will learn how to use global forecasting models and industry-proven techniques to improve their performance, which finally lets you develop scalable and efficient machine learning forecasting systems for thousands of time series.

Chapter 11, Introduction to Deep Learning, we switch tracks and start with a specific type of machine learning – deep learning. In this chapter, we lay the foundations of deep learning by looking at different topics such as representation learning, linear transformations, activation functions, and gradient descent.

Chapter 12, Building Blocks of Deep Learning for Time Series, continues the journey into deep learning by making it specific to time series. Keeping in mind the compositionality of deep learning systems, you will learn about different building blocks with which you can construct a deep learning architecture. The chapter starts off by establishing the encoder-decoder architecture and then talks about different blocks such as feed forward networks, recurrent neural networks, and convolutional neural networks.

Chapter 13, Common Modeling Patterns for Time Series, strengthens the encoder-decoder architecture that you saw in the previous chapter by showing you a few concrete and common patterns in which you can arrange building blocks to generate forecasts. This is a hands-on chapter where you will be creating forecasts using deep learning-based tabular regression and different sequence-to-sequence models.

Chapter 14, Attention and Transformers for Time Series, covers the contemporary topic of using attention to improve deep learning models. The chapter starts off by talking about a generalized attention model with which you will learn different types of attention schemes, such as scaled dot product and additive. You will also tweak the sequence-to-sequence models from the previous chapter to include attention and then train those models to generate a forecast. The chapter then talks about transformer models, which is a deep learning architecture that relies solely on attention, and then you will use that to generate forecasts as well.

Chapter 15, Strategies for Global Deep Learning Forecasting Models, tackles yet another important aspect of deep learning-based forecasting. Although the book talked about global forecasting models earlier, there are some differences in how it is implemented for deep learning models. In this chapter, you will learn how to implement global deep learning models and techniques on how to make those models better. You will also see them working in the hands-on section, where we will be generating forecasts using the real-world dataset we have been working with.

Chapter 16, Specialized Deep Learning Architectures for Forecasting, concludes your journey into deep learning-based time series forecasting by talking about a few popular, specialized deep learning architectures for time series forecasting. Using the concepts and building blocks you have learned through the previous chapters, this chapter takes you to the cutting edge of research and exposes the leading state-of-the-art models in time series forecasting such as N-BEATS, N-HiTS, Informer, Autoformer, and Temporal Fusion Transformer. In addition to understanding them, you will also learn how to use these models to generate forecasts using a real-world dataset.

Chapter 17, Multi-Step Forecasting, tackles the rarely talked about but highly relevant topic of multi-step forecasting. You will learn about different strategies for generating forecasts for more than one time step into the future, such as Recursive, Direct, DirRec, RecJoint, and Rectify. The book also talks about the merits and demerits of each of them and helps you choose the right strategy for your problem.

Chapter 18, Evaluating Forecasts – Forecast Metrics, traverses yet another topic that is rarely talked about and rife with controversy, with many opinions from different quarters. You will learn about different ways to measure the goodness of a forecast and through experiments, which you can run, expose the strengths and weaknesses of different metrics. The chapter concludes by laying down some guidelines that can help you choose the correct metric for your problem.

Chapter 19, Evaluating Forecasts – Validation Strategies, concludes the evaluation of forecasts and the book by talking about different validation strategies we can use for time series. You will learn different validation strategies such as hold-out, cross-validation, and their variations. The chapter also touches upon aspects to keep in mind while designing validation strategies for global settings as well. At the conclusion of the chapter, you will come across a few guidelines for choosing your validation strategies and answers to questions such as can we use cross-validation for time series?

To get the most out of this book

You should have basic familiarity with Python programming, as the entire code that we use for the practical sections is in Python. Familiarity with major libraries in Python, such as pandas and scikit-learn, are not essential (because the book covers some basics) but will help you get through the book much faster. Familiarity with PyTorch, the framework the book uses for deep learning, is also not essential but would accelerate your learning by many folds. Any of the software requirements shouldn’t stop you because, in today’s internet-enabled world, the only thing that is standing between you and a world of knowledge is the search bar in your favorite search engine.

Another key aspect to get the most out of this book is to run the associated notebooks as you go along the lessons. Also, feel free to experiment with different variations that the book doesn’t go into. That is a surefire way to internalize what’s being talked about in the book. And for that, we need to set up an environment, as you’ll see in the following section.

Setting up an environment

The easiest way to set up an environment is by using Anaconda, a distribution of Python for scientific computing. You can use Miniconda, a minimal installer for Conda, as well if you do not want the pre-installed packages that come with Anaconda:

  1. Install Anaconda/Miniconda: Anaconda can be installed from https://www.anaconda.com/products/distribution. Depending on your operating system, choose the corresponding file and follow the instructions. Alternatively, you can install Miniconda from here: https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links.
  2. Open conda prompt: To open Anaconda Prompt (or Terminal on Linux or macOS), do the following:
    • Windows: Open the Anaconda Prompt (Start | Anaconda Prompt)
    • macOS: Open Launchpad and then open Terminal. Type conda activate.
    • Linux: Open Terminal. Type conda activate.
  3. Navigate to the downloaded code: Use operating system-specific commands to navigate to the folder where you have downloaded the code. For instance, in Windows, use cd.
  4. Install the environment: Using the anaconda_env.yml file that is included, install the environment:
    conda env create -f anaconda_env.yml

This creates a new environment under the name modern_ts and will install all the required libraries in the environment. This can take a while.

  1. Checking the installation: We can check whether all the libraries required for the book are installed properly by executing a script in the downloaded code folder:
    python test_installation.py
  2. Activating the environment and running notebooks: Every time you want to run the notebooks, first activate the environment using the conda activate modern_ts command and then use the Jupyter Notebook (jupyter notebook) or JupyterLab (jupyter lab), according to your preference.

Download the data

You are going to be using a single dataset throughout the book. The book uses the London Smart Meters dataset from Kaggle for this purpose. Therefore, if you don’t have an account with Kaggle, please go ahead and create one: https://www.kaggle.com/account/login?phase=startRegisterTab.

There are two ways you can download the data-automated and manual.

For the automated way, we need to download a key from Kaggle. Let’s do that first (if you are going to choose the manual way, you can skip this):

  1. Click on your profile picture in the top-right corner of Kaggle.
  2. Select Account, and find the section for API.
  3. Click the Create New API Token button. A file with the name kaggle.json will be downloaded.
  4. Copy the file and place it in the api_keys folder in the downloaded code folder.

Now that we have kaggle.json downloaded and placed in the right folder, let’s look at the two methods to download data:

Method one – automated download

  1. Activate the environment using conda activate modern_ts.
  2. Run the provided script from the root directory of the downloaded code:
    python scripts/download_data.py

That’s it. Now, just wait for the script to finish downloading, unzip it, and organize the files in the expected format.

Method two – manual download

  1. Go to https://www.kaggle.com/jeanmidev/smart-meters-in-london and download the dataset.
  2. Unzip the contents to data/london_smart_meters.
  3. Unzip hhblock_dataset to get the raw files we want to work with.
  4. Make sure the unzipped files are in the expected folder structure (see the next section).

Now that you have downloaded the data, we need to make sure it is arranged in the following folder structure. The automated download does it automatically, but with the manual download, this structure needs to be created. To avoid ambiguity, the expected folder structure can be found as follows:

data
├── london_smart_meters
│   ├── hhblock_dataset
│   │   ├── hhblock_dataset
│   │       ├── block_0.csv
│   │       ├── block_1.csv
│   │       ├── ...
│   │       ├── block_109.csv
│── acorn_details.csv
├── informations_households.csv
├── uk_bank_holidays.csv
├── weather_daily_darksky.csv
├── weather_hourly_darksky.csv

There can be additional files as part of the extraction process. You can remove them without impacting anything. There is a helpful script that checks this structure.

python test_data_download.py

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository. Doing so will help you avoid any potential errors related to the copying and pasting of code.

The code that is provided along with the book is in no way a library but more of a guide for you to start experimenting on. The amount of learning you can derive from the book and code is directly proportional to how much you experiment with the code and stray outside your comfort zone. So, go ahead and start experimenting and putting the skills you pick up in the book to good use.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Modern-Time-Series-Forecasting-with-Python. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/5NVrW.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “statsmodels.tsa.seasonal has a function called seasonal_decompose.”

A block of code is set as follows:

#Does not support missing values, so using imputed ts instead
res = seasonal_decompose(ts, period=7*48, model="additive", extrapolate_trend="freq")

Any command-line input or output is written as follows:

conda env create -f anaconda_env.yml

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “But if you look at the Time Elapsed column, it stands out.

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Modern Time Series Forecasting with Python, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803246802

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Modern Time Series Forecasting with Python
Published in: Nov 2022Publisher: PacktISBN-13: 9781803246802
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Manu Joseph

Manu Joseph is a self-made data scientist with more than a decade of experience working with many Fortune 500 companies enabling digital and AI transformations, specifically in machine learning-based demand forecasting. He is considered an expert, thought leader, and strong voice in the world of time series forecasting. Currently, Manu leads applied research at Thoucentric, where he advances research by bringing cutting-edge AI technologies to the industry. He is also an active open-source contributor and developed an open-source library—PyTorch Tabular—which makes deep learning for tabular data easy and accessible. Originally from Thiruvananthapuram, India, Manu currently resides in Bengaluru, India, with his wife and son
Read more about Manu Joseph