Reader small image

You're reading from  Building Statistical Models in Python

Product typeBook
Published inAug 2023
Reading LevelIntermediate
PublisherPackt
ISBN-139781804614280
Edition1st Edition
Languages
Concepts
Right arrow
Authors (3):
Huy Hoang Nguyen
Huy Hoang Nguyen
author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

Paul N Adams
Paul N Adams
author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

Stuart J Miller
Stuart J Miller
author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller

View More author details
Right arrow

Introduction to Time Series

In Chapter 9, Discriminant Analysis, we concluded our overview of statistical classification modeling by introducing conditional probability using Bayes’ theorem, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA). In this chapter, we will introduce time series, the underlying statistical concepts, and how to apply them in everyday analysis. We will introduce the topic with the distinction between time-series data and what we have discussed up to this point in the book. We then provide an overview of what to expect with time-series modeling and the goals it can be leveraged to achieve. Within the context of time series, we then reintroduce the mean and variance statistical parameters, in addition to correlation. We provide an overview of linear differencing, cross-correlation, and autoregressive (AR) and moving average (MA) properties and how to identify their ordering using autocorrelation function (ACF) and partial ACF...

What is a time series?

In this chapter and the next few chapters, we will work with a type of data called time-series data. Up until this point, we have worked with independent data—that is, data consisting of samples that are not related. A time series is typically a measurement of the same sample taken over time, which makes the samples in this type of data related. There are many time series present around us every day. A few common examples of time series are daily temperature measurements, stock price ticks, and the heights of ocean tides. While a time series does not need to be measured at fixed intervals, in this book, we will primarily be concerned with measurements taken at fixed intervals, such as daily or every second.

Let’s look at some notation. In the following equation, we have a variable x that is repeatedly sampled over time. The subscripts enumerate the sample points (sample 1 through sample t), and the whole series of samples is denoted X. The subscript...

Goals of time series analysis

There are two goals in time-series analysis:

  • Identifying any patterns in the time series
  • Forecasting future values of the time series

We can use time-series analysis methods to uncover the nature of a time series. At the most basic level, we may want to know if a series appears to be random or if a time series appears to exhibit a pattern. If a time series has a pattern, we can determine if it has seasonal behavior, cyclical patterns, or exhibits trending behavior. We will investigate the behaviors of time series both by observation and by the results of fitting models. Models can provide insight into the nature of a series and allow us to forecast the future values of a time series.

The other goal of time-series analysis is forecasting. We see examples of forecasting in many common situations, such as weather forecasting and stock price forecasting. It is important to keep in mind that the methods of forecasting we cover in this...

Statistical measurements

When using time-series models to work with serially correlated data sets, we need to understand mean and variance – within the context of time – in addition to autocorrelation and cross-correlation. Understanding these variables helps build an intuition about how time-series models work and when they are more useful than models that do not account for time.

Mean

In time-series analysis, the sample mean of a series is the sum of all values across each point in time in the series divided by the count of values. Where t represents each discrete time step and n is the total number of time steps, we can calculate the sample mean of a time series as follows:

 _ X  =  1 _ n   t=1 n x t

There are two types of processes generating time series; one is an ergodic process and the other is non-ergodic. An ergodic process has consistent output independent of time, whereas a non-ergodic...

The white-noise model

Any time series can be considered to process two fundamental elements: signal and noise. We can present this mathematically as follows:

y(t) = signal(t) + noise(t)

The signal is some predictable pattern that we can model with a mathematical function. But the noise element in a time series is unpredictable and so cannot be modeled. Thinking of a time series this way leads to two consequential points:

  1. Before attempting to model, we should verify that the time series is not consistent with noise.
  2. Once we have fit a model to a time series, we should verify that the residuals are consistent with noise.

Regarding the first point, if a time series is consistent with noise, there is no predictable pattern to model, and attempting to model the time series could lead to misleading results. About the second point, if the residuals of a time-series model are not consistent with noise, then there are additional patterns we can further model, and the...

Stationarity

In this section, we provide an overview of stationary and non-stationary time series. Broadly speaking, the main difference between these two types of time series is the statistical properties such as mean, variance, and autocorrelation. They do not vary across time in stationary time series but do change through time in non-stationary time series. Particularly, time series with a trend or seasonality is non-stationary because the trend or seasonality will affect the statistical properties. The following examples illustrate the behaviors of stationary versus non-stationary time series [1]:

Figure 10.12 – Examples of stationary and non-stationary time series

Figure 10.12 – Examples of stationary and non-stationary time series

In order to check the stationary properties, we will check the three following conditions:

  • The mean is independent of time:

E[X t] = μ for all t

  • The variance is independent of time:

Var[X t] = σ 2 for all t

  • No autocorrelation...

Summary

This chapter started with an introduction to time series. We provided an overview of what a time series is and how it can be used to meet specific goals. We also discussed the criteria for differentiating time-series data from data that does not depend on time. We also discussed stationarity, which factors are important for stationarity, how to measure them, and how to resolve cases where stationarity does not exist. From there, we were able to understand the primary functions of ACF and PACF analysis and for making inferences about processes using variance around the mean. Additionally, we provided an introduction to time-series modeling with an overview of the white-noise model and the basic concepts behind autoregressive and moving average components, which help form the basis of ARIMA and seasonal autoregressive integrated moving average (SARIMA) time-series models.

In Chapter 11, ARIMA Models, we will also move deeper into the discussion of autoregressive, moving average...

References

[1] André Bauer, Automated Hybrid Time Series Forecasting: Design, Benchmarking, and Use Cases, University of Chicago, 2021.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Statistical Models in Python
Published in: Aug 2023Publisher: PacktISBN-13: 9781804614280
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Huy Hoang Nguyen

Huy Hoang Nguyen is a Mathematician and a Data Scientist with far-ranging experience, championing advanced mathematics and strategic leadership, and applied machine learning research. He holds a Master's in Data Science and a PhD in Mathematics. His previous work was related to Partial Differential Equations, Functional Analysis and their applications in Fluid Mechanics. He transitioned from academia to the healthcare industry and has performed different Data Science projects from traditional Machine Learning to Deep Learning.
Read more about Huy Hoang Nguyen

author image
Paul N Adams

Paul Adams is a Data Scientist with a background primarily in the healthcare industry. Paul applies statistics and machine learning in multiple areas of industry, focusing on projects in process engineering, process improvement, metrics and business rules development, anomaly detection, forecasting, clustering and classification. Paul holds a Master of Science in Data Science from Southern Methodist University.
Read more about Paul N Adams

author image
Stuart J Miller

Stuart Miller is a Machine Learning Engineer with degrees in Data Science, Electrical Engineering, and Engineering Physics. Stuart has worked at several Fortune 500 companies, including Texas Instruments and StateFarm, where he built software that utilized statistical and machine learning techniques. Stuart is currently an engineer at Toyota Connected helping to build a more modern cockpit experience for drivers using machine learning.
Read more about Stuart J Miller