Reader small image

You're reading from  Mastering Python for Finance. - Second Edition

Product typeBook
Published inApr 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789346466
Edition2nd Edition
Languages
Right arrow
Author (1)
James Ma Weiming
James Ma Weiming
author image
James Ma Weiming

James Ma Weiming is a software engineer based in Singapore. His studies and research are focused on financial technology, machine learning, data sciences, and computational finance. James started his career in financial services working with treasury fixed income and foreign exchange products, and fund distribution. His interests in derivatives led him to Chicago, where he worked with veteran traders of the Chicago Board of Trade to devise high-frequency, low-latency strategies to game the market. He holds an MS degree in finance from Illinois Tech's Stuart School of Business in the United States and a bachelor's degree in computer engineering from Nanyang Technological University.
Read more about James Ma Weiming

Right arrow

Statistical Analysis of Time Series Data

In financial portfolios, the returns on their constituent assets depend on a number of factors, such as macroeconomic and microeconomical conditions, and various financial variables. As the number of factors increases, so does the complexity involved in modeling portfolio behavior. Given that computing resources are finite, coupled with time constraints, performing an extra computation for a new factor only increases the bottleneck on portfolio modeling calculations. A linear technique for dimensionality reduction is Principal Component Analysis (PCA). As its name suggests, PCA breaks down the movement of portfolio asset prices into its principal components, or common factors, for further statistical analysis. Common factors that don't explain much of the movement of the portfolio assets receive less weighting in their factors and...

The Dow Jones industrial average and its 30 components

The Dow Jones Industrial Average (DJIA) is a stock market index that comprises the 30 largest US companies. Commonly known as the Dow, it is owned by S&P Dow Jones Indices LLC and computed on a price-weighted basis (see https://us.spindices.com/index-family/us-equity/dow-jones-averages for more information on the Dow).

This section involves downloading the datasets of Dow and its components into pandas DataFrame objects for use in later sections of this chapter.

Downloading Dow component datasets from Quandl

The following code retrieves the Dow component datasets from Quandl. The data provider that we will be using is WIKI Prices, a community formed by members...

Applying a kernel PCA

In this section, we will perform kernel PCA to find eigenvectors and eigenvalues so that we can reconstruct the Dow index.

Finding eigenvectors and eigenvalues

We can perform a kernel PCA using the KernelPCA class of the sklearn.decomposition module in Python. The default kernel method is linear. The dataset that's used in PCA is required to be normalized, which we can perform with z-scoring. The following code do this:

In [ ]:
from sklearn.decomposition import KernelPCA

fn_z_score = lambda x: (x - x.mean()) / x.std()

df_z_components = daily_df_components.apply(fn_z_score)
fitted_pca = KernelPCA().fit(df_z_components)

The fn_z_score variable is an inline function to perform...

Stationary and non-stationary time series

It is important that time series data that's used for statistical analysis is stationary in order to perform statistical modeling correctly, as such usages may be for prediction and forecasting. This section introduces the concepts of stationarity and non-stationarity in time series data.

Stationarity and non-stationarity

In empirical time series studies, price movements are observed to drift toward some long-term mean, either upwards or downwards. A stationary time series is one whose statistical properties, such as mean, variance, and autocorrelation, are constant over time. Conversely, observations on non-stationary time series data have their statistical properties...

The Augmented Dickey-Fuller Test

An Augmented Dickey-Fuller Test (ADF) is a type of statistical test that determines whether a unit root is present in time series data. Unit roots can cause unpredictable results in time series analysis. A null hypothesis is formed on the unit root test to determine how strongly time series data is affected by a trend. By accepting the null hypothesis, we accept the evidence that the time series data is non-stationary. By rejecting the null hypothesis, or accepting the alternative hypothesis, we accept the evidence that the time series data is generated by a stationary process. This process is also known as trend-stationary. Values of the ADF test statistic are negative. Lower values of ADF indicates stronger rejection of the null hypothesis.

Here are some basic autoregression models for use in ADF testing:

  • No constant and no trend:
  • A...

Making a time series stationary

A non-stationary time series data is likely to be affected by a trend or seasonality. Trending time series data has a mean that is not constant over time. Data that is affected by seasonality have variations at specific intervals in time. In making a time series data stationary, the trend and seasonality effects have to be removed. Detrending, differencing, and decomposition are such methods. The resulting stationary data is then suitable for statistical forecasting.

Let's look at all three methods in detail.

Detrending

The process of removing a trend line from a non-stationary data is known as detrending. This involves a transformation step that normalizes large values into smaller ones...

Forecasting and predicting a time series

In the previous section, we identified non-stationarity in time series data and discussed techniques for making time series data stationary. With stationary data, we can proceed to perform statistical modeling such as prediction and forecasting. Prediction involves generating best estimates of in-sample data. Forecasting involves generating best estimates of out-of-sample data. Predicting future values is based on previously observed values. One such commonly used method is the Autoregressive Integrated Moving Average.

About the Autoregressive Integrated Moving Average

The Autoregressive Integrated Moving Average (ARIMA) is a forecasting model for stationary time series based on linear...

Summary

In this chapter, we were introduced to PCA as a dimension reduction technique in portfolio modeling. By breaking down the movement of asset prices of a portfolio into its principal components, or common factors, the most useful factors can be kept, and portfolio analysis can be greatly simplified without compromising on computational time and space complexity. In applying PCA to the Dow and its thirty components using the KernelPCA function of the sklearn.decomposition module, we obtained eigenvectors and eigenvalues, which we used to reconstruct the Dow with five components.

In the statistical analysis of time series data, the data is considered as either stationary or non-stationary. Stationary time series data is data whose statistical properties are constant over time. Non-stationary time series data has its statistical properties change over time, most likely due...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Mastering Python for Finance. - Second Edition
Published in: Apr 2019Publisher: PacktISBN-13: 9781789346466
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
James Ma Weiming

James Ma Weiming is a software engineer based in Singapore. His studies and research are focused on financial technology, machine learning, data sciences, and computational finance. James started his career in financial services working with treasury fixed income and foreign exchange products, and fund distribution. His interests in derivatives led him to Chicago, where he worked with veteran traders of the Chicago Board of Trade to devise high-frequency, low-latency strategies to game the market. He holds an MS degree in finance from Illinois Tech's Stuart School of Business in the United States and a bachelor's degree in computer engineering from Nanyang Technological University.
Read more about James Ma Weiming