Reader small image

You're reading from  Hands-On Time Series Analysis with R

Product typeBook
Published inMay 2019
Reading LevelBeginner
PublisherPackt
ISBN-139781788629157
Edition1st Edition
Languages
Right arrow
Author (1)
Rami Krispin
Rami Krispin
author image
Rami Krispin

Rami Krispin is a data scientist at a major Silicon Valley company, where he focuses on time series analysis and forecasting. In his free time, he also develops open source tools and is the author of several R packages, including the TSstudio package for time series analysis and forecasting applications. Rami holds an MA in Applied Economics and an MS in actuarial mathematics from the University of MichiganAnn Arbor.
Read more about Rami Krispin

Right arrow

Forecasting with ARIMA Models

The Autoregressive Integrated Moving Average (ARIMA) model is the generic name for a family of forecasting models that are based on the Autoregressive (AR) and Moving Average (MA) processes. Among the traditional forecasting models (for example, linear regression, exponential smoothing, and so on), the ARIMA model is considered as the most advanced and robust approach. In this chapter, we will introduce the model components—the AR and MA processes and the differencing component. Furthermore, we will focus on methods and approaches for tuning the model's parameters with the use of differencing, the autocorrelation function (ACF), and the partial autocorrelation function (PACF).

In this chapter, we will cover the following topics:

  • The stationary state of time series data
  • The random walk process
  • The AR and MA processes
  • The ARMA and ARIMA...

Technical requirement

The following packages will be used in this chapter:

  • forecast: Version 8.5 and above
  • TSstudio: Version 0.1.4 and above
  • plotly: Version 4.8 and above
  • dplyr: Version 0.8.1 and above
  • lubridate: Version 1.7.4 and above
  • stats: Version 3.6.0 and above
  • datasets: Version 3.6.0 and above
  • base: Version 3.6.0 and above

You can access the codes for this chapter from the following link:

https://github.com/PacktPublishing/Hands-On-Time-Series-Analysis-with-R/tree/master/Chapter11

The stationary process

One of the main assumptions of the ARIMA family of models is that the input series follows the stationary process structure. This assumption is based on the Wold representation theorem, which states that any stationary process can be represented as a linear combination of white noise. Therefore, before we dive into the ARIMA model components, let's pause and talk about the stationary process. The stationary process, in the context of time series data, describes a stochastic state of the series. Time series data is stationary if the following conditions are taking place:

  • The mean and variance of the series do not change over time
  • The correlation structure of the series, along with its lags, remains the same over time

In the following examples, we will utilize the arima.sim function from the stats package to simulate a stationary and non-stationary...

The AR process

The AR process defines the current value of the series, Yt, as a linear combination of the previous p lags of the series, and can be formalized with the following equation:

Following are the terms used in the preceding equation:

  • AR(p) is the notation for an AR process with p-order
  • c represents a constant (or drift)
  • p defines the number of lags to regress against Yt
  • is the coefficient of the i lag of the series (here, must be between -1 and 1, otherwise, the series would be trending up or down and therefore cannot be stationary over time)
  • Yt-i is the i lag of the series
  • t represents the error term, which is white noise
An AR process can be used on time series data if, and only if, the series is stationary. Therefore, before applying an AR process on a series, you will have to verify that the series is stationary. Otherwise, you will have to apply some...

The moving average process

In some cases, the forecasting model is unable to capture all the series patterns, and therefore some information is left over in model residuals (or forecasting error) . The goal of the moving average process is to capture patterns in the residuals, if they exist, by modeling the relationship between Yt, the error term, t, and the past q error terms of the models (for example, ). The structure of the MA process is fairly similar to the ones of the AR. The following equation defines an MA process with a q order:

The following terms are used in the preceding equation:

  • MA(q) is the notation for an MA process with q-order
  • represents the mean of the series
  • are white noise error terms
  • is the corresponding coefficient of
  • q defines the number of past error terms to be used in the equation
Like the AR process, the MA equation holds only if the...

The ARMA model

Up until now, we have seen how the applications of AR and MA are processed separately. However, in some cases, combining the two allows us to handle more complex time series data. The ARMA model is a combination of the AR(p) and MA(q) processes and can be written as follows:

The following terms are used in the preceding equation:

  • ARMA(p,q) defines an ARMA process with a p-order AR process and q-order moving average process
  • Yt represents the series itself
  • c represents a constant (or drift)
  • p defines the number of lags to regress against Yt
  • is the coefficient of the i lag of the series
  • Yt-1 is the i lag of the series
  • q defines the number of past error terms to be used in the equation
  • is the corresponding coefficient of
  • are white noise error terms
  • represents the error term, which is white noise

For instance, an ARMA(3,2) model is defined by the following equation...

Forecasting AR, MA, and ARMA models

Forecasting any of the models we saw until now was straightforward: we used the forecast function from the forecast package in a similar manner to how we used it in the previous chapter. For instance, the following code demonstrates the forecast of the next 100 observations of the AR model we trained previously in The AR process section with the ar function:

ar_fc <- forecast(md_ar, h = 100)

We can use plot_forecast to plot the forecast output:

plot_forecast(ar_fc, 
title = "Forecast AR(2) Model",
Ytitle = "Value",
Xtitle = "Year")

We get the following output:

The ARIMA model

One of the limitations of the AR, MA, and ARMA models is that they cannot handle non-stationary time series data. Therefore, if the input series is non-stationary, a preprocessing step is required to transform the series from a non-stationary state into a stationary state. The ARIMA model provides a solution for this issue by adding the integrated process for the ARMA model. The Integrated (I) process is simply differencing the series with its lags, where the degree of the differencing is represented by the d parameter. The differencing process, as we saw previously, is one of the ways you can transform the methods of a series from non-stationary to stationary. For instance, Yt - Yt-1 represents the first differencing of the series, while (Yt - Yt-1) - (Yt-1 - Yt-2) represents the second differencing. We can generalize the differencing process with the following...

The seasonal ARIMA model

The Seasonal ARIMA (SARIMA) model, as its name implies, is a designated version of the ARIMA model for time series with a seasonal component. As we saw in Chapter 6, Seasonality Analysis, and Chapter 7, Correlation Analysis, a time series with a seasonal component has a strong relationship with its seasonal lags. The SARIMA model is utilizing the seasonal lags in a similar manner to how the ARIMA model is utilizing the non-seasonal lags with the AR and MA processes and differencing. It does this by adding the following three components to the ARIMA model:

  • SAR(P) process: A seasonal AR process of the series with its past P seasonal lags. For example, a SAR(2) is an AR process of the series with its past two seasonal lags, that is, , where Φ represents the seasonal coefficient of the SAR process, and f represents the series frequency.
  • SMA(Q) process...

The auto.arima function

One of the main challenges of forecasting with the ARIMA family of models is the cumbersome tuning process of the models. As we saw in this chapter, this process includes many manual steps that are required for verifying the structure of the series (stationary or non-stationary), data transformations, descriptive analysis with the ACF and PACF plots to identify the type of process, and eventually tune the model parameters. While it might take a few minutes to train an ARIMA model for a single series, it may not scale up if you have dozens of series to forecast.

The auto.arima function from the forecast package provides a solution to this issue. This algorithm automates the tuning process of the ARIMA model with the use of statistical methods to identify both the structure of the series (stationary or not) and type (seasonal or not), and sets the model&apos...

Linear regression with ARIMA errors

In Chapter 9, Forecasting with Linear Regression, we saw that with some simple steps, we can utilize a linear regression model as a time series forecasting model. Recall that a general form of the linear regression model can be represented by the following equation:

One of the main assumptions of the linear regression model is that the error term of the series, , is the white noise series (for example, there is no correlation between the residuals and their lags). However, when working with time series data, this assumption is eased as, typically, the model predictors do not explain all the variations of the series, and some patterns are left on the model residuals. An example of the failure of this assumption can be seen while fitting a linear regression model to forecast the AirPassenger series.

...

Summary

In this chapter, we introduced the ARIMA family of models, one of the core approaches for forecasting time series data. The main advantages of the ARIMA family of models is their flexibility and modularity, as they can handle both seasonal and non-seasonal time series data by adding or modifying the model components. In addition, we saw the applications of the ACF and PACF plots for identifying the type of process (for example, AR, MA, ARMA, and so on) and its order.

While it is essential to be familiar with the tuning process of ARIMA models, in practice, as the number series to be forecast increase, you may want to automate this process. The auto.arima function is one of the most common approaches in R to forecast with ARIMA models as it can scale up when dozens of series need to be forecast.

Last but not least, we saw applications of linear regression with the ARIMA...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Time Series Analysis with R
Published in: May 2019Publisher: PacktISBN-13: 9781788629157
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rami Krispin

Rami Krispin is a data scientist at a major Silicon Valley company, where he focuses on time series analysis and forecasting. In his free time, he also develops open source tools and is the author of several R packages, including the TSstudio package for time series analysis and forecasting applications. Rami holds an MA in Applied Economics and an MS in actuarial mathematics from the University of MichiganAnn Arbor.
Read more about Rami Krispin