Reader small image

You're reading from  Extending Excel with Python and R

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781804610695
Edition1st Edition
Right arrow
Authors (2):
Steven Sanderson
Steven Sanderson
author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

David Kun
David Kun
author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun

View More author details
Right arrow

Time Series Analysis: Statistics, Plots, and Forecasting

In the realm of mathematical analysis, particularly in the study of data and trends, time series charts play a pivotal role. A time series chart is a graphical representation that displays data points collected over a sequence of time intervals. This tool is indispensable in various fields, including economics, finance, environmental science, and social sciences, for analyzing patterns, trends, and fluctuations in data over time.

A typical time series chart comprises two essential components: the time axis and the data axis. The time axis represents the progression of time, which can be measured in various units, such as seconds, minutes, hours, days, months, or years. The data axis displays the values of the variable being studied, which can be anything from stock prices and temperature readings to population counts and sales figures.

To construct a time series chart, you must do the following:

  • Data collection...

Technical requirements

There are a few technical requirements for this chapter. Note that the code for this chapter can be found at https://github.com/PacktPublishing/Extending-Excel-with-Python-and-R/tree/main/Chapter%2010.

Some of the packages that we will be using in this chapter are as follows:

  • healthyR.ts
  • forecast
  • timetk
  • Modeltime
  • prophet (for Python)
  • keras
  • tensorflow

We will start by creating time series objects in base R. The basic object class for a time series object in R is ts and objects can be coerced to that object by either using the ts() function directly or calling as.ts() on an object such as a vector.

Generating random time series objects in R

We are going to generate some random time series objects in base R. Doing this is very simple as base R comes with some distribution functions already packed in. We will make use of the random normal distribution by making calls to the rnorm() function. This function has three parameters to provide arguments to:

  • n: The number of points to be generated
  • mean: The mean of the distribution, with a default of 0
  • sd: The standard deviation of the distribution, with the default being 1

Let’s go ahead and generate our first random vector. We will call it x:

# Generate a Random Time Series
# Set seed to make results reproducible
set.seed(123)
# Generate Random Points using a gaussian distribution with mean 0 and sd = 1
n <- 25
x <- rnorm(n)
head(x)
[1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499

In the preceding code, we did the following:

...

Time series plotting

In this section, we will cover plotting time series objects, along with plotting some diagnostics such as decomposition. These plots include time series plots themselves, autocorrelation function (ACF) plots, and partial autocorrelation function (PACF) plots. We will start by using the AirPassengers dataset, which we will read in via the readxl package:

# Read the airpassengers.xlsx file in and convert to a ts object starting at 1949
ap_ts <- read_xlsx("./Chapter 10/airpassengers.xlsx")  |>
  ts(start = 1949, frequency = 12)
# Plot the ts object
plot(ap_ts)

This produces the following chart:

Figure 10.2 – Visualizing the AirPassengers time series dataset

Figure 10.2 – Visualizing the AirPassengers time series dataset

From here, it is easy to see that the data has a trend and a seasonal cycle component. This observation will lead us to our next visual. We will decompose the data into its parts and visualize the decomposition. The decomposition of the...

Auto ARIMA modeling with healthyR.ts

Time series, just like any other set of data, can be modeled. The methods are vast, both old and new. In this section, we are going to discuss ARIMA modeling, and more specifically building an automatic ARIMA model with the healthyR.ts library in R. ARIMA models themselves attempt to describe the autocorrelations in the data.

In this section, we will use a workflow that ends with the ts_auto_arima() function creating and fitting a tuned model. This model requires our data to be in tibble format. So, to do this, we will use the AirPassengers dataset and make sure it is a tibble.

Let’s get started with the dataset we have already brought in and coerce it into a tibble:

library(healthyR.ts)
library(dplyr)
library(timetk)
ap_tbl <- ts_to_tbl(ap_ts) |>
  select(-index)
> class(ap_tbl)
[1] "tbl_df"      "tbl"          ...

Creating a Brownian motion with healthyR.ts

The final time series plot that we are going to showcase is the Brownian motion. Brownian motion, also known as the Wiener process, is a fundamental concept in finance and mathematics that describes the random movement of particles in a fluid. In the context of finance, it is often used to model the price movement of financial instruments such as stocks, commodities, and currencies.

Here are some of the key characteristics of Brownian motion:

  • Randomness: Brownian motion is inherently random. The future direction and magnitude of movement at any point in time cannot be predicted with certainty.
  • Continuous path: The path of a Brownian motion is continuous, meaning that the asset’s price can move smoothly without sudden jumps or gaps.
  • Independent increments: The changes in the asset’s price over non-overlapping time intervals are independent of each other. In other words, the price movement in one interval does...

Time series analysis in Python – statistics, plots, and forecasting

Before diving into time series analysis, it’s crucial to have data to work with. In this section, we’ll walk through the process of creating mock time series data, saving it to an Excel file, and then reading it back into pandas. This will serve as our foundation for the upcoming time series analysis.

As always, we’ll start by loading the relevant libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Then, we must create the sample data and save it to Excel so that it can be used in the rest of this chapter:

# Create a date range
date_rng = pd.date_range(start='2022-01-01', end='2023-12-31',
    freq='D')
# Create a trend component
trend = 0.05 * np.arange(len(date_rng))
# Create a seasonal component (cyclicality)
seasonal = 2.5 * np.sin(2 * np.pi * np.arange(len(date_rng)) / 365)
# Add some random noise...

Time series plotting – basic plots and ACF/PACF plots

Visualizing time series data is a crucial step in understanding its underlying patterns and trends. In this section, we’ll explore various time series plots and how to create them using Python. These visualizations help us gain insights into seasonality, trends, and autocorrelation within the time series data.

We’ll start by loading the required libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

Then, we must load the data from Excel and ensure the date information is converted correctly:

# Load time series data (replace 'time_series_data.xlsx' with your data file)
data = pd.read_excel('time_series_data.xlsx')
# Convert the 'Date' column to datetime format and set it as the index
data['Date'] = pd.to_datetime(data['Date'])
data...

Time series statistics and statistical forecasting

Data exploration and statistical analysis are crucial steps in understanding the characteristics of time series data. In this section, we’ll walk you through how to perform data exploration and apply statistical analysis techniques in Python to gain valuable insights into your time series.

Statistical analysis for time series data

After exploring the data using the plots in the previous section, let’s move on to statistical analysis to gain a deeper understanding. This section focuses on two areas:

  • The Augmented Dickey-Fuller (ADF) test: This statistical test is used to determine whether the time series data is stationary. Stationary data is easier to model and forecast.
  • Time series decomposition: Time series decomposition separates the data into its constituent components: trend, seasonality, and residuals. This decomposition aids in isolating patterns for forecasting.

We’ll understand...

Understanding predictive modeling approaches

In this section, we’ll delve into predictive modeling approaches using two powerful Python libraries – statsmodels and prophet.

These libraries provide diverse tools to tackle time series forecasting, enabling you to make informed decisions and predictions based on your time series data.

Forecasting with statsmodels

statsmodels is a popular library in the Python ecosystem that offers a wide range of statistical tools, including time series analysis. For forecasting, it provides functionality for building ARIMA models. ARIMA models are a staple in time series analysis, allowing you to capture and model complex patterns within your data.

Building an ARIMA model with statsmodels involves selecting the appropriate order of differencing, autoregressive components, and moving average components to best represent the underlying patterns of the data. Once the model has been established, you can make forecasts and evaluate...

Time series forecasting with deep learning – LSTM

This section will give you insights into advanced time series forecasting techniques using deep learning models. Whether you’re working with traditional time series data or more complex, high-dimensional data, these deep learning models can help you make more accurate predictions. In particular, we will cover the Long Short-Term Memory (LSTM) method using keras.

We will be using keras with a tensorflow backend, so you need to install both libraries:

  1. As always, let’s load the necessary libraries and preprocess some time series data:
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from keras.models import Sequential
    from keras.layers import LSTM, Dense
    from sklearn.preprocessing import MinMaxScaler
    # Load the time series data (replace with your data)
    time_series_data = pd.read_excel('time_series_data.xlsx')
    # Normalize the data to be in the range [0, 1]
    scaler = MinMaxScaler...

Summary

In this chapter, we delved into the fascinating world of time series analysis. We began by exploring time series plotting, mastering essential plots, and understanding the significance of ACF/PACF plots.

Moving forward, we ventured into time series statistics, including the ADF test, time series decomposition, and statistical forecasting with tools such as statsmodels and prophet.

To elevate our forecasting game, we embraced deep learning, employing LSTM networks using Python’s keras library. We learned to develop accurate time series forecasts and create insightful visualizations for data-driven insights.

This chapter equipped us with a comprehensive set of skills for time series analysis, enabling us to unravel the hidden patterns and insights within time-based data, from plotting to statistical analysis and deep learning forecasting.

In the next chapter, we will discuss a different integration method – that is, calling R and Python from Excel directly...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Extending Excel with Python and R
Published in: Apr 2024Publisher: PacktISBN-13: 9781804610695
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Steven Sanderson

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor's degree in economics and his master's in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.
Read more about Steven Sanderson

author image
David Kun

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.
Read more about David Kun