Reader small image

You're reading from  Practical Time Series Analysis

Product typeBook
Published inSep 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781788290227
Edition1st Edition
Languages
Right arrow
Authors (2):
Avishek Pal
Avishek Pal
author image
Avishek Pal

Dr. Avishek Pal, PhD, is a software engineer, data scientist, author, and an avid Kaggler living in Hyderabad, India. He achieved his Bachelor of Technology degree in industrial engineering from the Indian Institute of Technology (IIT) Kharagpur and earned his doctorate in 2015 from University of Warwick, Coventry, United Kingdom. He started his career as a software engineer at IBM India developing middleware solutions for telecom clients. This was followed by stints at a start-up product development company followed by Ericsson, the global telecom giant. After doctoral studies, Avishek started his career in India as a lead machine learning engineer for a leading US-based investment company. He is currently working at Microsoft as a senior data scientist. Avishek has published several research papers in reputed international conferences and journals.
Read more about Avishek Pal

PKS Prakash
PKS Prakash
author image
PKS Prakash

Dr. PKS Prakash is a data scientist and author. He has spent the last 12 years in developing many data science solutions in several practical areas in healthcare, manufacturing, pharmaceuticals, and e-commerce. He currently works as the data science manager at ZS Associates. He is the co-founder of Warwick Analytics, a spin-off from University of Warwick, UK. Prakash has published articles widely in research areas of operational research and management, soft computing tools, and advanced algorithms in leading journals such as IEEE-Trans, EJOR, and IJPR, among others. He has edited an article on Intelligent Approaches to Complex Systems and contributed to books such as Evolutionary Computing in Advanced Manufacturing published by WILEY and Algorithms and Data Structures using R and R Deep Learning Cookbook, published by PACKT.
Read more about PKS Prakash

View More author details
Right arrow

Chapter 2. Understanding Time Series Data

In the previous chapter, we touched upon a general approach of time series analysis which consists of two main steps:

  • Data visualization to check the presence of trend, seasonality, and cyclical patterns
  • Adjustment of trend and seasonality to generate stationary series

Generating stationary data is important for enhancing the time series forecasting model. Deduction of the trend, seasonal, and cyclical components would leave us with irregular fluctuations which cannot be modeled by using only the time index as an explanatory variable. Therefore, in order to further improve forecasting, the irregular fluctuations are assumed to be independent and identically distributed (iid) observations and modeled by a linear regression on variables other than the time index.

For example, house prices might exhibit both trend and seasonal (for example, quarterly) variations. However, the residuals left after adjusting trend and seasonality might actually depend on...

Advanced processing and visualization of time series data


In many cases, the original time series needs to be transformed into aggregate statistics. For example, observations in the original time series might have been recorded at every second; however, in order to perform any meaningful analysis, data must be aggregated every minute. This would need resampling the observations over periods that are longer than the granular time indices in the original data. The aggregate statistics, such as mean, median, and variance, is calculated for each of the longer periods of time.

Another example of data pre-processing for time series, is computing aggregates over similar segments in the data. Consider the monthly sales of cars manufactured by company X where the data exhibits monthly seasonality, due to which sales during a month of a given year shows patters similar to the sales of the same month in the previous and next years. To highlight this kind of seasonality we must remove the long-run trend...

Resampling time series data


The technique of resmapling is illustrated using a time series on chemical concentration readings taken every two hours between 1st January 1975 and 17th January 1975. The dataset has been downloaded from http://datamarket.com and is also available in the datasets folder of this book's GitHub repo.

We start by importing the packages required for running this example:

from __future__ import print_function 
import os 
import pandas as pd 
import numpy as np 
%matplotlib inline 
from matplotlib import pyplot as plt 

Then we set the working directory as follows:

os.chdir('D:/Practical Time Series') 

This is followed by reading the data from the CSV file in a pandas.DataFrame and displaying shape and the first 10 rows of the DataFrame:

df = pd.read_csv('datasets/chemical-concentration-readings.csv') 
print('Shape of the dataset:', df.shape) 
df.head(10) 

The preceding code returns the following output:

Shape of the dataset: (197, 2) 

Stationary processes


Properties of data such as central tendency, dispersion, skewness, and kurtosis are called sample statistics. Mean and variance are two of the most commonly used sample statistics. In any analysis, data is collected by gathering information from a sample of the larger population. Mean, variance, and other properties are then estimated based on the sample data. Hence these are referred to as sample statistics.

An important assumption in statistical estimation theory is that, for sample statistics to be reliable, the population does not undergo any fundamental or systemic shifts over the individuals in the sample or over the time during which the data has been collected. This assumption ensures that sample statistics do not alter and will hold for entities that are outside the sample used for their estimation.

This assumption also applies to time series analysis so that mean, variance and auto-correlation estimated from the simple can be used as a reasonable estimate for...

Time series decomposition


The objective of time series decomposition is to model the long-term trend and seasonality and estimate the overall time series as a combination of them. Two popular models for time series decomposition are:

  • Additive model
  • Multiplicative model

The additive model formulates the original time series (xt) as the sum of the trend cycle (Ft) and seasonal (St) components as follows:

xt = Ft + St + Єt

The residuals Єt obtained after adjusting the trend and seasonal components are the irregular variations. The additive model is usually applied when there is a time-dependent trend cycle component, but independent seasonality that does not change over time.

The multiplicative decomposition model, which gives the time series as product of the trend, seasonal, and irregular components is useful when there is time-varying seasonality:

xt = Ft x St x Єt

By taking logarithm, the multiplicative model is converted to an additive model of logarithm of the individual components. The multiplicative...

Summary


We started this chapter by discussing advanced data processing techniques such as resampling, group-by, and moving window computations to obtain aggregate statistics from a time series. Next, we described stationary time series and discussed statistical tests of hypothesis such as Ljung-Box test and Augmented Dickey Fuller test to verify stationarity of a time series. Stationarizing non-stationary time series is important for time series forecasting. Therefore, we discussed two different approaches of stationarizing time series.  Firstly, the method of differencing, which covers first, second, and seasonal differencing, has been described for stationarizing a non-stationary time series. Secondly, time series decomposition using the statsmodels.tsa API for additive and multiplicative models has been discussed. In the next chapter, we delve deeper in techniques of exponential smoothing which deals with noisy time series data.

 

 

 

 

 

 

 

 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Time Series Analysis
Published in: Sep 2017Publisher: PacktISBN-13: 9781788290227
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Avishek Pal

Dr. Avishek Pal, PhD, is a software engineer, data scientist, author, and an avid Kaggler living in Hyderabad, India. He achieved his Bachelor of Technology degree in industrial engineering from the Indian Institute of Technology (IIT) Kharagpur and earned his doctorate in 2015 from University of Warwick, Coventry, United Kingdom. He started his career as a software engineer at IBM India developing middleware solutions for telecom clients. This was followed by stints at a start-up product development company followed by Ericsson, the global telecom giant. After doctoral studies, Avishek started his career in India as a lead machine learning engineer for a leading US-based investment company. He is currently working at Microsoft as a senior data scientist. Avishek has published several research papers in reputed international conferences and journals.
Read more about Avishek Pal

author image
PKS Prakash

Dr. PKS Prakash is a data scientist and author. He has spent the last 12 years in developing many data science solutions in several practical areas in healthcare, manufacturing, pharmaceuticals, and e-commerce. He currently works as the data science manager at ZS Associates. He is the co-founder of Warwick Analytics, a spin-off from University of Warwick, UK. Prakash has published articles widely in research areas of operational research and management, soft computing tools, and advanced algorithms in leading journals such as IEEE-Trans, EJOR, and IJPR, among others. He has edited an article on Intelligent Approaches to Complex Systems and contributed to books such as Evolutionary Computing in Advanced Manufacturing published by WILEY and Algorithms and Data Structures using R and R Deep Learning Cookbook, published by PACKT.
Read more about PKS Prakash

Timestamp

Chemical conc.

0

1975-01-01 00:00...