Reader small image

You're reading from  Forecasting Time Series Data with Facebook Prophet

Product typeBook
Published inMar 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781800568532
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Greg Rafferty
Greg Rafferty
author image
Greg Rafferty

Greg Rafferty is a data scientist in San Francisco, California. With over a decade of experience, he has worked with many of the top firms in tech, including Google, Facebook, and IBM. Greg has been an instructor in business analytics on Coursera and has led face-to-face workshops with industry professionals in data science and analytics. With both an MBA and a degree in engineering, he is able to work across the spectrum of data science and communicate with both technical experts and non-technical consumers of data alike.
Read more about Greg Rafferty

Right arrow

Chapter 11: Cross-Validation

The concept of keeping training data and testing data separate is sacrosanct in machine learning and statistics. You should never train a model and test its performance on the same data. Setting data aside for testing purposes has a downside, though: that data has valuable information that you would want to include in training. Cross-validation is a technique that's used to circumvent this problem.

You may be familiar with k-fold cross-validation, but if you are not, we will briefly cover it in this chapter. K-fold, however, will not work on time series. It requires that the data be independent, an assumption that time series data does not hold. An understanding of k-fold will help you learn how forward-chaining cross-validation works and why it is necessary for time series data.

After learning how to perform cross-validation in Prophet, you will learn how to speed up the computing of cross-validation through Prophet's ability to parallelize...

Technical requirements

The data files and code for examples in this chapter can be found at https://github.com/PacktPublishing/Forecasting-Time-Series-Data-with-Facebook-Prophet.

Performing k-fold cross-validation

We'll be using a new dataset in this chapter, the sales of an online retailer in the United Kingdom. This data has been anonymized, but it represents 3 years of daily sales amounts, as displayed in the following graph:

Figure 11.1 – Daily sales of an anonymous online retailer

This retailer has not seen dramatic growth over the 3 years of data, but it has seen a massive boost in sales at the end of the year. The main customer of this retailer is wholesalers, who typically make their purchases during the work week. This is why when we plot the components of Prophet's forecast, you'll see that Saturday and Sunday's sales are the lowest. We'll use this data to perform cross-validation in Prophet.

Before we get to modeling, though, let's first review traditional validation techniques to tune a model's hyperparameters and report performance. The most basic method is to take your full...

Performing forward-chaining cross-validation

Forward-chaining cross-validation, also called rolling-origin cross-validation, is similar to k-fold but suited to sequential data such as time series. There is no random shuffling of data to begin but a test set may be set aside. The test set must be the final portion of data, so if each fold is going to be 10% of your data (as it would be in 10-fold cross-validation), then your test set will be the final 10% of your date range.

With the remaining data, you choose an initial amount of data to train on, let's say five folds in this example, and then you evaluate on the sixth fold and save that performance metric. You re-train now on the first six folds and evaluate on the seventh. You repeat until all folds are exhausted and again take the average of your performance metric. The folds using this technique would look like this:

Figure 11.4 – Forward-chaining cross-validation with five folds

In this...

Creating the Prophet cross-validation DataFrame

To perform cross-validation in Prophet, first you need a fitted model. So, we'll begin with the same procedure we've completed throughout this book. This dataset is very cooperative so we'll be able to use plenty of Prophet's default parameters. We will plot the changepoints, so be sure to include that function with your other imports before loading the data:

import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot
df = pd.read_csv('online_retail.csv')
df['date'] = pd.to_datetime(df['date'])
df.columns = ['ds', 'y']

This dataset does not have very complicated seasonality, so we'll reduce the Fourier order of yearly seasonality when instantiating our model, but keep everything else default, before fitting, predicting, and plotting. We'll use a 1-year future forecast:

model...

Parallelizing cross-validation

There is a lot of iteration going on during cross-validation and these are tasks that can be parallelized to speed things up. All you need to do to take advantage of this is use the parallel keyword. There are four options you may choose: None, 'processes', 'threads', or 'dask':

df_cv = cross_validation(model,
                         horizon='90 days',
                         period='30 days',
                         initial='730 days',
              ...

Summary

We began this chapter with a discussion of why k-fold cross-validation was developed in traditional machine learning applications, and we then learned why it will not work with time series. You then learned about forward-chaining, also called rolling-origin cross-validation, for use with time series data.

You learned the keywords of initial, horizon, period, and cutoffs, which are used to define your cross-validation parameters, and you learned how to implement them in Prophet. Finally, you learned the different options Prophet has for parallelization, in order to speed up model evaluation.

These techniques provide you with a statistically robust way to evaluate and compare models. By isolating the data used in training and testing, you remove any bias in the process and can be more certain that your model will perform well when making new predictions about the future.

In the next chapter, you'll apply what you learned here to measure your model's performance...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Forecasting Time Series Data with Facebook Prophet
Published in: Mar 2021Publisher: PacktISBN-13: 9781800568532
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Greg Rafferty

Greg Rafferty is a data scientist in San Francisco, California. With over a decade of experience, he has worked with many of the top firms in tech, including Google, Facebook, and IBM. Greg has been an instructor in business analytics on Coursera and has led face-to-face workshops with industry professionals in data science and analytics. With both an MBA and a degree in engineering, he is able to work across the spectrum of data science and communicate with both technical experts and non-technical consumers of data alike.
Read more about Greg Rafferty