Packt+ | Advance your knowledge in tech

You're reading from Mastering Predictive Analytics with R

Product typeBook

Published inJun 2015

Reading LevelExpert

Publisher

ISBN-139781783982806

Edition1st Edition

Languages

Tools

RStudio

Concepts

Predictive Analytics

Authors (2):

Rui Miguel Forte

View More author details

Chapter 9. Time Series Analysis

Many models that we come across involve observing a process of some sort over a period of time in order to learn to predict how that process will behave in the future. As we are dealing with a process that generates observations indexed by time, we refer to these models as time series models. Classic examples of time series are stock market indexes, volume of sales of a company's product over time, and changing weather attributes such as temperature and rainfall during the year.

In this chapter, we will focus on univariate time series, that is to say, time series that involve monitoring how a single variable fluctuates over time. To do this, we begin with some basic tools for describing time series, followed by an overview of a number of fundamental examples. It turns out that there is a wide variety of different approaches to modeling time series; in this chapter, we will focus primarily on ARIMA models, but we will also provide pointers on a few alternatives...

Fundamental concepts of time series

A time series is just a sequence of random variables, Y₁, Y₂, …, Y_T, indexed by an evenly spaced sequence of points in time. Time series are ubiquitous in everyday life; we can observe the total amount of rainfall in millimeters over yearly periods for consecutive years, the average daytime temperature over consecutive days, the price of a particular share in the stock market at the close of every day of trading, or the total number of patients in a doctor's waiting room every half hour. As we can see, examples abound.

To analyze time series data, we use the concept of a stochastic process, which is just a sequence of random variables that are generated via an underlying mechanism that is stochastic or random, as opposed to deterministic. From the perspective of the predictive modeler, our goal is to study time series in order to build a model that best describes the behavior of a finite set of samples that we have obtained, in order for us to predict how...

Some fundamental time series

We will begin our study of time series by looking at two famous but very simple examples. These will not only give us a feel for the field, but as we will see later on, they will also become integral building blocks to describe more complex time series.

White noise

A basic but very important type of time series is known as discrete white noise, or simply white noise. In a white noise time series, the random variables that are generated all have a mean of 0, finite and identical variance σ_w², and the random variables at different time steps are uncorrelated with each other. Although some texts do not enforce this requirement, most texts also specify that the variables are also independent and identically distributed (iid) random variables.

The iid property essentially requires that each random variable come from the exact same distribution, such as a normal distribution with a particular mean and standard deviation. The property also requires that two variables from...

Stationarity

We have often seen that in predictive modeling, we need to make certain important but limiting assumptions in order to build practical models. With time series models, one of the most common assumptions to make that render the modeling task significantly simpler is the stationarity assumption.

Stationarity essentially describes that the probabilistic behavior of a time series does not change with the passage of time. There are two versions of the stationarity property that are commonly used. A stochastic process is said to be strictly stationary when the joint probability distribution of a sequence of points starting at time t, Y_t, Y_t+1, ..., Y_t+n, is the same as the joint probability distribution of another sequence of points starting at a different time T, Y_T, Y_T₊₁, ..., Y_T+n.

To be strictly stationary, this property must hold for any choice of time t and T, and for any sequence length n. In particular, because we can choose n = 1, this means that the probability distributions...

Stationary time series models

In this section, we will describe a few stationary time series models. As we will see, these can be used to model a number of real-world processes.

Moving average models

A moving average (MA) process is a stochastic process in which the random variable at time step t is a linear combination of the most recent (in time) terms of a white noise process. Concretely, we can write this in an equation as follows:

In the previous equation, and henceforth, we will assume that the e terms are white noise random variables with mean 0 and variance σ_w². We can describe a moving average process in an equivalent way by making use of the backshift operator, B. The backshift operator is an operator that when applied to a random variable in a stochastic process at time t, produces the random variable at the previous time step, t-1. For example:

We can obtain random variables further back in time by successive applications of the backshift operator. B², for example, indicates the...

Non-stationary time series models

In this section, we will look at some models that are non-stationary but nonetheless have certain properties that allow us to either derive a stationary model or model the non-stationary behavior.

Autoregressive integrated moving average models

The random walk process is an example of a time series model that is itself non-stationary, but the differences between consecutive points, Y_t and Y_t+1, which we can write as ∆Y_t, is stationary. This differenced sequence was nothing but the white noise sequence, which we know to be stationary.

If we were to take the difference between consecutive output points of the differenced sequence, we would again obtain another sequence, which we call a second order differenced sequence.

Generalizing this notion of differencing, we can say that a d^th order difference is obtained by repeatedly computing differences between consecutive terms d times, to obtain a new sequence with points, W_t, from an original sequence, Y_t. We can...

Predicting intense earthquakes

Having reviewed several time series models, we are now ready for some practical examples. Our first data set is a time series of earthquakes having magnitude that exceeds 4.0 on the Richter scale in Greece over the period between the year 2000 and the year 2008. This data set was recorded by the Observatory of Athens and is hosted on the website of the University of Athens, Faculty of Geology, Department of Geophysics & Geothermics. The data is available online at http://www.geophysics.geol.uoa.gr/catalog/catgr_20002008.epi.

We will import these data directly by using the package RCurl. From this package, we will use the functions getURL(), which retrieves the contents of a particular address on the Internet, and textConnection(), which will interpret the result as raw text. Once we have the data, we provide meaningful names for the columns using information from the website:

> library("RCurl")
> seismic_raw <- read.table(textConnection(getURL(
...

Predicting lynx trappings

Our second data set, known as the lynx data set, is a very famous data set and is provided with the core distribution of R. This was first presented in a 1942 paper by C. Elton and M. Nicholson, titled The ten year cycle in numbers of Canadian lynx, which appears in the Journal of Animal Ecology. The data consist of the number of Canadian lynx trapped in the MacKenzie river over the period 1821-1934. We can load the data as follows:

> data(lynx)

The following diagram shows a plot of the lynx data:

We will repeat the exact same series of analysis steps as we did with the earthquake data. First, we will create a grid of parameter combinations and use this to train multiple models. Then we will pick the best one on account of it having the smallest AIC value. Finally, we will use the chosen parameter combination to train a model and forecast the next few data points. The reader is encouraged to also experiment with auto.arima().

> d <- 0:2
> p <- 0:6
>...

Predicting foreign exchange rates

Our third and final data set will be constructed from a historical database of Euro Foreign Exchange Reference rates provided by the website of the European Central Bank. We can download a zipped archive containing the data from http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip. When unzipped, this archive contains a file titled eurofxref-hist.csv, which we can directly import into R using the read.csv() function:

> eurofxref.hist <- read.csv("eurofxref-hist.csv",
                             stringsAsFactors = F)
> eurofxref.hist[1 : 6, 1 : 6]
        Date    USD    JPY    BGN CYP    CZK
1 2014-09-05 1.2948 136.27 1.9558 N/A 27.596
2 2014-09-04 1.3015 136.89 1.9558 N/A 27.662
3 2014-09-03 1.3151 138.11 1.9558 N/A 27.658
4 2014-09-02 1.3115 137.63 1.9558 N/A 27.784
5 2014-09-01 1.3133 136.97 1.9558 N/A 27.738
6 2014-08-29 1.3188 137.11 1.9558 N/A 27.725

As we can see, our data frame contains the conversion rates for several different currencies...

Other time series models

In this chapter, we spent most of our time on studying models that describe a time series in terms of the patterns of correlations between different points in time. This approach led us to the ARIMA family of models, which we have seen are highly configurable and have successfully been employed in many real-world problems. There is a diverse array of methods that have been applied to the time series problem and in fact we have seen a few elsewhere in this book as well.

The neural networks that we studied in Chapter 4, Neural Networks, and the hidden Markov models that we saw in Chapter 8, Probabilistic Graphical Models, are two such examples. Sometimes, we can treat a time series as a regression problem, and so techniques from this area can be leveraged too.

One other important class of methods is exponential smoothing. There are two key premises behind methods that use this approach. The first of these is that a time series is usually decomposed into a number of different...

Summary

The focus of this chapter was on understanding the fundamental tools that are useful in studying time series. Time series analysis is a very large field, but in this brief synopsis, we explored the basic concepts that are essential to further study. We started off by looking at some properties of time series such as the autocorrelation function and saw how this, along with the partial autocorrelation function, can provide important clues about the underlying process involved.

Next, we introduced stationarity, which is a very useful property of some time series that in a nutshell says that the statistical behavior of the underlying process does not change over time. We introduced white noise as a stochastic process that forms the basis of many other processes. In particular, it appears in the random walk process, the moving average (MA) process, as well as the autoregressive process (AR). These, in turn, we saw can be combined to yield even more complex time series.

In order to handle...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Predictive Analytics with R

Published in: Jun 2015Publisher: ISBN-13: 9781783982806

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Rui Miguel Forte

Why do you think this reviewer is suitable for this book? Mr. Rui Miguel Forte has authored a book for Packt titled “Mastering Predictive Analytics with R”. The book has received a 5 star rating. He has 3 years experience as a Data Scientist. He has knowledge of Scala, Python, R, PHP. • Has the reviewer published any articles or blogs on this or a similar tool/technology ? [Provide Links and References] A brief of Unsupervised learning has been covered in his book “Mastering Predictive Analytics with R” https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781783982806/ https://www.linkedin.com/profile/view?id=AAkAAAC5YUIBYL7LyLCWZ6LsR0ENJxByC2jU9AU&authType=NAME_SEARCH&authToken=c1Pg&locale=en_US&trk=tyah&trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A12149058%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1444032603690%2Ctas%3ARui%20Miguel%20Forte • Feedback on the Outline (in case outline has been shared with the reviewer) The author said the outline is good to go. • Did the reviewer share any concerns or questions regarding the reviewing process? (related to the schedule, commitment, or any additional comments) No
Read more about Rui Miguel Forte

Rui Miguel Forte

Rui Miguel Forte is currently the chief data scientist at Workable. He was born and raised in Greece and studied in the UK. He is an experienced data scientist, having over 10 years of work experience in a diverse array of industries spanning mobile marketing, health informatics, education technology, and human resources technology. His projects have included predictive modeling of user behavior in mobile marketing promotions, speaker intent identification in an intelligent tutor, information extraction techniques for job applicant resumes and fraud detection for job scams. He currently teaches R, MongoDB, and other data science technologies to graduate students in the Business Analytics MSc program at the Athens University of Economics and Business. In addition, he has lectured in a number of seminars, specialization programs, and R schools for working data science professionals in Athens. His core programming knowledge is in R and Java, and he has extensive experience working with a variety of database technologies such as Oracle, PostgreSQL, MongoDB, and HBase. He holds a Master’s degree in Electrical and Electronic Engineering from Imperial College London and is currently researching machine learning applications in information extraction and natural language processing.
Read more about Rui Miguel Forte

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages