Packt+ | Advance your knowledge in tech

You're reading from Clojure for Data Science

Product typeBook

Published inSep 2015

Reading LevelIntermediate

Publisher

ISBN-139781784397180

Edition1st Edition

Languages

Clojure

Concepts

Data Analysis

Author (1)

Henry Garner

Chapter 9. Time Series

	"Again time elapsed."
	--Carolyn Keene, The Secret of the Old Clock

In several of the previous chapters, we saw how we can apply iterative algorithms to identify solutions to complex equations. We first encountered this with gradient descent—both batch and stochastic—but most recently we saw it in community detection in graphs using the graph-parallel model of computation.

This chapter is about time series data. A time series is any data series that consists of regular observations of a quantity arranged according to the time of their measurement. For many of the techniques in this chapter to work, we require that the intervals between successive observations are all equal. The period between measurements could be monthly in the case of sales figures, daily in the case of rainfall or stock market fluctuations, or by minute in the case of hits to a high-traffic website.

For us to be able to predict the future values of a time series, we require that the future values...

About the data

This chapter will make use of two datasets that come pre-installed with Incanter: the Longley dataset, which contains data on seven economic variables measured in the United States between the years 1947 to 1962, and the Airline dataset, which contains the monthly total airline passengers from January 1949 to December 1960.

Note

You can download the source code for this chapter from https://github.com/clojuredatascience/ch9-time-series.

The Airline dataset is where we will spend most of our time in this chapter, but first let's look at the Longley dataset. It contains columns including the gross domestic product (GDP), the number of employed and unemployed people, the population, and the size of the armed forces. It's a classic dataset for analyzing multicollinearity since many of the predictors are themselves correlated. This won't affect the analysis we're performing since we'll only be using one of the predictors at a time.

Loading the Longley data

Since Incanter includes the...

Fitting curves with a linear model

First, let's remind ourselves how we would fit a straight line using Incanter's linear-model function. We want to extract the x5 and x6 columns from the dataset and apply them (in that order: x6, the year, is our predictor variable) to the incanter.stats/linear-model function.

(defn ex-9-3 []
  (let [data  (d/get-dataset :longley)
        model (s/linear-model (i/$ :x5 data)
                              (i/$ :x6 data))]
    (println "R-square" (:r-square model))
    (-> (c/scatter-plot (i/$ :x6 data)
                        (i/$ :x5 data)
                        :x-label "Year"
                        :y-label "Population")
        (c/add-lines (i/$ :x6 data)
                     (:fitted model))
        (i/view))))

;; R-square 0.9879

The preceding code generates the following chart:

While the straight line is a close fit to the data—generating an R² of over 0.98—it doesn't capture the curve of the line. In particular, we can see that points diverge from...

Time series decomposition

One of the problems that we have modeling the military time series is that there is simply not enough data to be able to produce a general model of the process that produced the series. A common way to model a time series is to decompose the series into a number of separate components:

Trend: Does the series generally increase or decrease over time? Is the trend an exponential curve as we saw with the population?
Seasonality: Does the series exhibit periodic rises and falls at a set number of intervals? For monthly data it is common to observe a period cycle of 12 months.
Cycles: Are there longer-term cycles in the dataset that span multiple seasons? For example, in financial data we might observe multi-year cycles corresponding to periods of expansion and recession.

Another way of specifying the issue with the military data is that there is not enough information to determine whether or not there is a trend, and whether the observed peak is part of a seasonal or...

Discrete time models

Discrete time models, such as the ones we have been looking at so far, separate time into slices at regular intervals. For us to be able to predict future values of time slices, we assume that they are dependent on past slices.

Note

Time series can also be analyzed with respect to frequency rather than time. We won't discuss frequency domain analysis in this chapter but the book's wiki at http://wiki.clojuredatascience.com contains links to further resources.

In the following, let y_t denote the value of an observation at time t. The simplest time series possible would be one where the value of each time slice is the same as the one directly preceding it. The predictor for such a series would be:

This is to say that the prediction at time t + 1 given t is equal to the observed value at time t. Notice that this definition is recursive: the value at time t depends on the value at t - 1. The value at t - 1 depends on the value at t - 2, and so on.

We could model this "constant...

Maximum likelihood estimation

On several occasions throughout this book, we've expressed optimization problems in terms of a cost function to be minimized. For example, in Chapter 4, Classification, we used Incanter to minimize the logistic cost function whilst building a logistic regression classifier, and in Chapter 5, Big Data, we used gradient descent to minimize a least-squares cost function when performing batch and stochastic gradient descent.

Optimization can also be expressed as a benefit to maximize, and it's sometimes more natural to think in these terms. Maximum likelihood estimation aims to find the best parameters for a model by maximizing the likelihood function.

Let's say that the probability of an observation x given model parameters β is written as:

Then, the likelihood can be expressed as:

The likelihood is a measure of the probability of the parameters, given the data. The aim of maximum likelihood estimation is to find the parameter values that make the observed data most...

Time series forecasting

With the parameter estimates having been defined, we're finally in a position to use our model for forecasting. We've actually already written most of the code we need to do this: we have an arma function that's capable of generating an autoregressive moving-average series based on some seed data and the model parameters p and q. The seed data will be our measured values of y from the airline data, and the values of p and q will be the parameters that we calculated using the Nelder-Mead method.

Let's plug those numbers into our ARMA model and generate a sequence of predictions for y:

(defn ex-9-32 []
  (let [data (i/log (airline-passengers))
        diff-1  (difference 1 data)
        diff-12 (difference 12 diff-1)
        forecast (->> (arma (take 9 (reverse diff-12))
                       []
                       (:ar params)
                       (:ma params) 0)
                      (take 100)
                      (undifference 12 diff-1)
            ...

Summary

In this chapter, we've considered the task of analyzing discrete time series: sequential observations taken at fixed intervals in time. We've seen how the challenge of modeling such a series can be made easier by decomposing it into a set of components: a trend component, a seasonal component, and a cyclic component.

We've seen how ARMA models decompose a series further into autoregressive and moving-average components, each of which is in some way determined by past values of the series. This conception of a series is inherently recursive, and we've seen how Clojure's natural capabilities for defining recursive functions and lazy sequences lend themselves to the algorithmic generation of such series. By determining each value of the series as a function of the previous values, we implemented a recursive ARMA generator that was capable of simulating a measured series and forecasting it forwards in time.

We've also learned about expectation maximization: a way of reframing solutions...

The rest of the chapter is locked

You have been reading a chapter from

Clojure for Data Science

Published in: Sep 2015Publisher: ISBN-13: 9781784397180

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Henry Garner

Henry Garner is a graduate from the University of Oxford and an experienced developer, CTO, and coach. He started his technical career at Britain's largest telecoms provider, BT, working with a traditional data warehouse infrastructure. As a part of a small team for 3 years, he built sophisticated data models to derive insight from raw data and use web applications to present the results. These applications were used internally by senior executives and operatives to track both business and systems performance. He then went on to co-found Likely, a social media analytics start-up. As the CTO, he set the technical direction, leading to the introduction of an event-based append-only data pipeline modeled after the Lambda architecture. He adopted Clojure in 2011 and led a hybrid team of programmers and data scientists, building content recommendation engines based on collaborative filtering and clustering techniques. He developed a syllabus and copresented a series of evening classes from Likely's offices for professional developers who wanted to learn Clojure. Henry now works with growing businesses, consulting in both a development and technical leadership capacity. He presents regularly at seminars and Clojure meetups in and around London.
Read more about Henry Garner

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages