You're reading from Forecasting Time Series Data with Facebook Prophet

Product typeBook

Published inMar 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781800568532

Edition1st Edition

Languages

Python

Tools

Prophet

Concepts

Predictive Analytics

Author (1)

Greg Rafferty

Chapter 11: Cross-Validation

The concept of keeping training data and testing data separate is sacrosanct in machine learning and statistics. You should never train a model and test its performance on the same data. Setting data aside for testing purposes has a downside, though: that data has valuable information that you would want to include in training. Cross-validation is a technique that's used to circumvent this problem.

You may be familiar with k-fold cross-validation, but if you are not, we will briefly cover it in this chapter. K-fold, however, will not work on time series. It requires that the data be independent, an assumption that time series data does not hold. An understanding of k-fold will help you learn how forward-chaining cross-validation works and why it is necessary for time series data.

After learning how to perform cross-validation in Prophet, you will learn how to speed up the computing of cross-validation through Prophet's ability to parallelize...

Technical requirements

The data files and code for examples in this chapter can be found at https://github.com/PacktPublishing/Forecasting-Time-Series-Data-with-Facebook-Prophet.

Performing k-fold cross-validation

We'll be using a new dataset in this chapter, the sales of an online retailer in the United Kingdom. This data has been anonymized, but it represents 3 years of daily sales amounts, as displayed in the following graph:

Figure 11.1 – Daily sales of an anonymous online retailer

This retailer has not seen dramatic growth over the 3 years of data, but it has seen a massive boost in sales at the end of the year. The main customer of this retailer is wholesalers, who typically make their purchases during the work week. This is why when we plot the components of Prophet's forecast, you'll see that Saturday and Sunday's sales are the lowest. We'll use this data to perform cross-validation in Prophet.

Before we get to modeling, though, let's first review traditional validation techniques to tune a model's hyperparameters and report performance. The most basic method is to take your full...

Performing forward-chaining cross-validation

Forward-chaining cross-validation, also called rolling-origin cross-validation, is similar to k-fold but suited to sequential data such as time series. There is no random shuffling of data to begin but a test set may be set aside. The test set must be the final portion of data, so if each fold is going to be 10% of your data (as it would be in 10-fold cross-validation), then your test set will be the final 10% of your date range.

With the remaining data, you choose an initial amount of data to train on, let's say five folds in this example, and then you evaluate on the sixth fold and save that performance metric. You re-train now on the first six folds and evaluate on the seventh. You repeat until all folds are exhausted and again take the average of your performance metric. The folds using this technique would look like this:

Figure 11.4 – Forward-chaining cross-validation with five folds

In this...

Creating the Prophet cross-validation DataFrame

To perform cross-validation in Prophet, first you need a fitted model. So, we'll begin with the same procedure we've completed throughout this book. This dataset is very cooperative so we'll be able to use plenty of Prophet's default parameters. We will plot the changepoints, so be sure to include that function with your other imports before loading the data:

import pandas as pd
import matplotlib.pyplot as plt
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot
df = pd.read_csv('online_retail.csv')
df['date'] = pd.to_datetime(df['date'])
df.columns = ['ds', 'y']

This dataset does not have very complicated seasonality, so we'll reduce the Fourier order of yearly seasonality when instantiating our model, but keep everything else default, before fitting, predicting, and plotting. We'll use a 1-year future forecast:

model...

Parallelizing cross-validation

There is a lot of iteration going on during cross-validation and these are tasks that can be parallelized to speed things up. All you need to do to take advantage of this is use the parallel keyword. There are four options you may choose: None, 'processes', 'threads', or 'dask':

df_cv = cross_validation(model,
                         horizon='90 days',
                         period='30 days',
                         initial='730 days',
              ...

Summary

We began this chapter with a discussion of why k-fold cross-validation was developed in traditional machine learning applications, and we then learned why it will not work with time series. You then learned about forward-chaining, also called rolling-origin cross-validation, for use with time series data.

You learned the keywords of initial, horizon, period, and cutoffs, which are used to define your cross-validation parameters, and you learned how to implement them in Prophet. Finally, you learned the different options Prophet has for parallelization, in order to speed up model evaluation.

These techniques provide you with a statistically robust way to evaluate and compare models. By isolating the data used in training and testing, you remove any bias in the process and can be more certain that your model will perform well when making new predictions about the future.

In the next chapter, you'll apply what you learned here to measure your model's performance...

The rest of the chapter is locked

You have been reading a chapter from

Forecasting Time Series Data with Facebook Prophet

Published in: Mar 2021Publisher: PacktISBN-13: 9781800568532

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Greg Rafferty

Greg Rafferty is a data scientist in San Francisco, California. With over a decade of experience, he has worked with many of the top firms in tech, including Google, Facebook, and IBM. Greg has been an instructor in business analytics on Coursera and has led face-to-face workshops with industry professionals in data science and analytics. With both an MBA and a degree in engineering, he is able to work across the spectrum of data science and communicate with both technical experts and non-technical consumers of data alike.
Read more about Greg Rafferty

Other recommended products

Related to this chapter

JupyterLab Quick Start Guide

Jupyterlab is a web-based data science interface and natural evolution of Jupyter Notebooks. This guide will take you through the core commands and functionalities of JupyterLab. You will learn to customize and enhance your JupyterLab productivity by installing additional extensions.

BookDec 2019160 pages

Practical Time Series Analysis

Practical Time Series Analysis will introduce you to the basic concepts of time series analysis and describe powerful yet simple techniques in Python which data scientists and data engineers would find useful in dealing with real life datasets in industrial settings. This book focuses on explaining important concepts and practical techniques to process, summarize and model time series data. Real life case studies with code snippets in Python are used to demonstrate the concepts and techniques.

BookSep 2017244 pages

Hands-On Financial Trading with Python

This book focuses on key Python analytics and algorithmic trading libraries used for backtesting. With the help of practical examples, you will learn the principle aspects of trading strategy development. The 14 profitable strategies included in the book will also help you build intuitions that will enable you to create your own strategy.

BookApr 2021360 pages

Hands-On Time Series Analysis with R

This book introduces you to time series analysis and forecasting with R; this is one of the key fields in statistical programming and includes techniques for analyzing data to extract meaningful insights. You will explore methods, such as prediction with time series analysis, and identify the relationship between each data point in the series.

BookMay 2019448 pages

The Data Analysis Workshop

The Data Analysis Workshop is a comprehensive guide that shows you how to analyze your data and gain insights into your business. Starting with the basics of data analysis, including data visualization and exploratory data analysis, this book takes you through the complete spectrum of techniques, such as time series analysis and categorical data analysis. It is the ideal companion on your journey to becoming an expert data analyst.

BookJul 2020626 pages

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

BookJan 2020432 pages

Applying Math with Python

Python has a number of powerful packages to help anyone tackle complex mathematical problems in a simple and efficient way. This practical guide explains how to model real-world problems as mathematical objects in Python and how to perform computations, and interpret results. It explores Python lang to solve a variety of math and statistics problems.

BookJul 2020358 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages