Packt+ | Advance your knowledge in tech

You're reading from Python for Finance. - Second Edition

Product type Book

Published in Jun 2017

Publisher

ISBN-13 9781787125698

Pages 586 pages

Edition 2nd Edition

Languages

Python

Concepts

Data Analysis

Table of Contents (23) Chapters

Python for Finance Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Python Basics

Introduction to Python Modules

Time Value of Money

Sources of Data

Diving into deeper concepts

Bond and Stock Valuation

Capital Asset Pricing Model

Multifactor Models and Performance Measures

Time-Series Analysis

Portfolio Theory

Options and Futures

Value at Risk

Monte Carlo Simulation

Credit Risk Analysis

Exotic Options

Volatility, Implied Volatility, ARCH, and GARCH

Index

Chapter 8. Time-Series Analysis

In finance and economics, a huge amount of our data is in the format of time-series, such as stock prices and Gross Domestic Products (GDP). From Chapter 4, Sources of Data, it is shown that from Yahoo!Finance, we could download daily, weekly, and monthly historical price time-series. From Federal Reserve Bank's Economics Data Library (FRED), we could retrieve many historical time-series such as GDP. For time-series, there exist many issues, such as how to estimate returns from historical price data, how to merge datasets with the same or different frequencies, seasonality, and detect auto-correlation. Understanding those properties is vitally important for our knowledge development.

In this chapter, the following topics will be covered:

Introduction to time-series analysis
Design a good date variable, and merging different datasets by date
Normal distribution and normality test
Term structure of interest rates, 52-week high, and low trading strategy
Return estimation...

Introduction to time-series analysis

Most finance data is in the format of time-series, see the following several examples. The first one shows how to download historical, daily stock price data from Yahoo!Finance for a given ticker's beginning and ending dates:

from matplotlib.finance import quotes_historical_yahoo_ochl as getData
x = getData("IBM",(2016,1,1),(2016,1,21),asobject=True, adjusted=True)
print(x[0:4])

The output is shown here:

The type of the data is numpy.recarray as the type(x) would show. The second example prints the first several observations from two datasets called ffMonthly.pkl and usGDPquarterly.pkl, and both are available from the author's website, such as http://canisius.edu/~yany/python/ffMonthly.pkl:

import pandas as pd
GDP=pd.read_pickle("c:/temp/usGDPquarterly.pkl")
ff=pd.read_pickle("c:/temp/ffMonthly.pkl")
print(GDP.head())
print(ff.head())

The related output is shown here:

There is one end of chapter problem which is designed to merge discrete data with the daily...

Merging datasets based on a date variable

To make our time-series more manageable, it is a great idea to generate a date variable. When talking about such a variable, readers could think about year (YYYY), year and month (YYYYMM) or year, month, and day (YYYYMMDD). For just the year, month, and day combination, we could have many forms. Using January 20, 2017 as an example, we could have 2017-1-20, 1/20/2017, 20Jan2017, 20-1-2017, and the like. In a sense, a true date variable, in our mind, could be easily manipulated. Usually, the true date variable takes a form of year-month-day or other forms of its variants. Assume the date variable has a value of 2000-12-31. After adding one day to its value, the result should be 2001-1-1.

Using pandas.date_range() to generate one dimensional time-series

We could easily use the pandas.date_range() function to generate our time-series; refer to the following example:

import pandas as pd
import scipy as sp
sp.random.seed(1257)
mean=0.10
std=0.2
ddate = pd...

Understanding the interpolation technique

Interpolation is a technique used quite frequently in finance. In the following example, we have to replace two missing values, NaN, between 2 and 6. The pandas.interpolate() function, for a linear interpolation, is used to fill in the two missing values:

import pandas as pd 
import numpy as np 
nn=np.nan
x=pd.Series([1,2,nn,nn,6]) 
print(x.interpolate())

The output is shown here:

0    1.000000
1    2.000000
2    3.333333
3    4.666667
4    6.000000
dtype: float64

The preceding method is a linear interpolation. Actually, we could estimate a Δ and calculate those missing values manually:

Here, v2(v1) is the second (first) value and n is the number of intervals between those two values. For the preceding case, Δ is (6-2)/3=1.33333. Thus, the next value will be v1+Δ=2+1.33333=3.33333. This way, we could continually estimate all missing values. Note that if we have several periods with missing values, then the delta for each period has to be calculated manually...

Tests of normality

In finance, knowledge about normal distribution is very important for two reasons. First, stock returns are assumed to follow a normal distribution. Second, the error terms from a good econometric model should follow a normal distribution with a zero mean. However, in the real world, this might not be true for stocks. On the other hand, whether stocks or portfolios follow a normal distribution could be tested by various so-called normality tests. The Shapiro-Wilk test is one of them. For the first example, random numbers are drawn from a normal distribution. As a consequence, the test should confirm that those observations follow a normal distribution:

from scipy import stats 
import scipy as sp
sp.random.seed(12345)
mean=0.1
std=0.2
n=5000
ret=sp.random.normal(loc=0,scale=std,size=n)
print 'W-test, and P-value' 
print(stats.shapiro(ret))
W-test, and P-value
(0.9995986223220825, 0.4129064679145813)

Assume that our confidence level is 95%, that is, alpha=0.05. The first value...

52-week high and low trading strategy

Some investors/researchers argue that we could adopt a 52-week high and low trading strategy by taking a long position if today's price is close to the maximum price achieved in the past 52 weeks and taking an opposite position if today's price is close to its 52-week low. Let's randomly choose a day of 12/31/2016. The following Python program presents this 52-week's range and today's position:

import numpy as np
from datetime import datetime 
from dateutil.relativedelta import relativedelta 
from matplotlib.finance import quotes_historical_yahoo_ochl as getData
#
ticker='IBM' 
enddate=datetime(2016,12,31)
#
begdate=enddate-relativedelta(years=1) 
p =getData(ticker, begdate, enddate,asobject=True, adjusted=True) 
x=p[-1] 
y=np.array(p.tolist())[:,-1] 
high=max(y) 
low=min(y) 
print(" Today, Price High Low, % from low ") 
print(x[0], x[-1], high, low, round((x[-1]-low)/(high-low)*100,2))

The corresponding output is shown as follows:

According to the 52-week...

Estimating Roll's spread

Liquidity is defined as how quickly we can dispose of our asset without losing its intrinsic value. Usually, we use spread to represent liquidity. However, we need high-frequency data to estimate spread. Later in the chapter, we show how to estimate spread directly by using high-frequency data. To measure spread indirectly based on daily observations, Roll (1984) shows that we can estimate it based on the serial covariance in price changes, as follows:

Here, S is the Roll spread, Pt is the closing price of a stock on day,

is Pt-Pt-1, and

, t is the average share price in the estimation period. The following Python code estimates Roll's spread for IBM, using one year's daily price data from Yahoo! Finance:

from matplotlib.finance import quotes_historical_yahoo_ochl as getData
import scipy as sp 
ticker='IBM' 
begdate=(2013,9,1) 
enddate=(2013,11,11) 
data= getData(ticker, begdate, enddate,asobject=True, adjusted=True) 
p=data.aclose 
d=sp.diff(p)
cov_=sp.cov(d[:-1...

Estimating Amihud's illiquidity

According to Amihud (2002), liquidity reflects the impact of order flow on price. His illiquidity measure is defined as follows:

Here, illiq(t) is the Amihud's illiquidity measure for month t, Ri is the daily return at day i, Pi is the closing price at i, and Vi is the daily dollar trading volume at i. Since the illiquidity is the reciprocal of liquidity, the lower the illiquidity value, the higher the liquidity of the underlying security. First, let's look at an item-by-item division:

>>>x=np.array([1,2,3],dtype='float') 
>>>y=np.array([2,2,4],dtype='float') 
>>>np.divide(x,y) 
array([ 0.5 , 1. , 0.75]) 
>>>

In the following code, we estimate Amihud's illiquidity for IBM based on trading data in October 2013. The value is 1.21*10-11. It seems that this value is quite small. Actually, the absolute value is not important; the relative value matters. If we estimate the illiquidity for WMT over the same period, we would find a...

Estimating Pastor and Stambaugh (2003) liquidity measure

Based on the methodology and empirical evidence in Campbell, Grossman, and Wang (1993), Pastor and Stambaugh (2003) designed the following model to measure individual stock's liquidity and the market liquidity:

Here, yt is the excess stock return, Rt-Rf , t, on day t, Rt is the return for the stock, Rf,t is the risk-free rate, x1,t is the market return, and x2,t is the signed dollar trading volume:

pt is the stock price, and volume, t is the trading volume. The regression is run based on daily data for each month. In other words, for each month, we get one β2 that is defined as the liquidity measure for individual stock. The following code estimates the liquidity for IBM. First, we download the IBM and S&P500 daily price data, estimate their daily returns, and merge them as follows:

import numpy as np 
from matplotlib.finance import quotes_historical_yahoo_ochl as getData
import numpy as np 
import pandas as pd 
import statsmodels...

Fama-MacBeth regression

First, let's look at the OLS regression by using the pandas.ols function as follows:

from datetime import datetime 
import numpy as np 
import pandas as pd 
n = 252 
np.random.seed(12345) 
begdate=datetime(2013, 1, 2) 
dateRange = pd.date_range(begdate, periods=n) 
x0= pd.DataFrame(np.random.randn(n, 1),columns=['ret'],index=dateRange) 
y0=pd.Series(np.random.randn(n), index=dateRange) 
print pd.ols(y=y0, x=x0)

For the Fama-MacBeth regression, we have the following code:

import numpy as np 
import pandas as pd 
import statsmodels.api as sm
from datetime import datetime 
#
n = 252 
np.random.seed(12345) 
begdate=datetime(2013, 1, 2) 
dateRange = pd.date_range(begdate, periods=n) 
def makeDataFrame(): 
    data=pd.DataFrame(np.random.randn(n,7),columns=['A','B','C','D','E',' F','G'],
    index=dateRange) 
    return data 
#
data = { 'A': makeDataFrame(), 'B': makeDataFrame(), 'C': makeDataFrame() }
Y = makeDataFrame() 
print(pd.fama_macbeth(y=Y,x=data))

Durbin-Watson

Durbin-Watson statistic is related auto-correlation. After we run a regression, the error term should have no correlation, with a mean zero. Durbin-Watson statistic is defined as:

Here, et is the error term at time t, T is the total number of error term. The Durbin-Watson statistic tests the null hypothesis that the residuals from an ordinary least-squares regression are not auto-correlated against the alternative that the residuals follow an AR1 process. The Durbin-Watson statistic ranges in value from 0 to 4. A value near 2 indicates non-autocorrelation; a value toward 0 indicates positive autocorrelation; a value toward 4 indicates negative autocorrelation, see the following table:

Durbin-Watson Test	Description
	No autocorrelation
Towards 0	Positive auto-correlation
Towards 4	Negative auto-correlation

Table 8.3 Durbin-Watson Test

The following Python program runs a CAPM first by using daily data for IBM. The S&P500 is used as the index. The time period is from...

Python for high-frequency data

High-frequency data is referred to as second-by-second or millisecond-by-millisecond transaction and quotation data. The New York Stock Exchange's Trade and Quotation (TAQ) database is a typical example (http://www.nyxdata.com/data-products/daily-taq). The following program can be used to retrieve high-frequency data from Google Finance:

import tempfile
import re, string 
import pandas as pd 
ticker='AAPL'                    # input a ticker 
f1="c:/temp/ttt.txt"             # ttt will be replace with above sticker
f2=f1.replace("ttt",ticker) 
outfile=open(f2,"w") 
#path="http://www.google.com/finance/getprices?q=ttt&i=300&p=10d&f=d,o, h,l,c,v" 
path="https://www.google.com/finance/getprices?q=ttt&i=300&p=10d&f=d,o,%20h,l,c,v"
path2=path.replace("ttt",ticker) 
df=pd.read_csv(path2,skiprows=8,header=None) 
fp = tempfile.TemporaryFile()
df.to_csv(fp) 
print(df.head())
fp.close()

In the preceding program, we have two input variables: ticker...

Spread estimated based on high-frequency data

Based on the Consolidated Quote (CQ) dataset supplied by Prof. Hasbrouck, we generate a dataset with the pickle format of pandas, that can be downloaded from http://canisius.edu/~yany/python/TORQcq.pkl. Assume that the following data is located under C:\temp:

import pandas as pd 
cq=pd.read_pickle("c:/temp/TORQcq.pkl") 
print(cq.head() )

The output is shown here:

           date      time     bid     ofr  bidsiz  ofrsiz  mode  qseq
symbol                                                                
AC      19901101   9:30:44  12.875  13.125      32       5    10    50
AC      19901101   9:30:47  12.750  13.250       1       1    12     0
AC      19901101   9:30:51  12.750  13.250       1       1    12     0
AC      19901101   9:30:52  12.750  13.250       1       1    12     0
AC      19901101  10:40:13  12.750  13.125       2       2    12     0
>>>cq.tail() 
            date      time     bid     ofr  bidsiz  ofrsiz  mode  qseq
symbol...

Introduction to CRSP

For this book, our focus is free public data. Thus, we only discuss a few financial databases since some readers might from schools with valid subscription. CRSP is the one. In this chapter, we mention just three Python datasets.

Center for Research in Security Prices (CRSP). It contains all trading data, such as closing price, trading volume, and shares outstanding for all listed stocks in the US from 1926 onward. Because of its quality and long history, it has been used intensively by academic researchers and practitioners. The first dataset is called crspInfo.pkl, see the following code:

import pandas as pd
x=pd.read_pickle("c:/temp/crspInfo.pkl")
print(x.head(3))
print(x.tail(2))

The related output is shown here:

   PERMNO  PERMCO     CUSIP                       FIRMNAME TICKER  EXCHANGE  \
0   10001    7953  36720410                GAS NATURAL INC   EGAS         2   
1   10002    7954  05978R10  BANCTRUST FINANCIAL GROUP INC   BTFG         3   
2   10003    7957  39031810...

References

Please refer to the following articles:

Amihud and Yakov, 2002, Illiquidity and stock returns: cross-section and time-series effects, Journal of Financial Markets, 5, 31–56, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.145.9505&rep=rep1&type=pdf
Bali, T. G., Cakici, N., and Whitelaw, R. F., 2011, Maxing out: Stocks as lotteries and the cross-section of expected returns, Journal of Financial Economics, 99(2), 427–446 http://www.sciencedirect.com/science/article/pii/S0304405X1000190X
Cook Pine Capital LLC, November 26, 2008, Study of Fat-tail Risk, http://www.cookpinecapital.com/pdf/Study%20of%20Fat-tail%20Risk.pdf
CRSP web site, http://crsp.com/
CRSP user manual, http://www.crsp.com/documentation
George, T.J., and Hwang, C., 2004, The 52-Week High and Momentum Investing, Journal of Finance 54(5), 2145–2176, http://www.bauer.uh.edu/tgeorge/papers/gh4-paper.pdf
Hasbrouck, Joel, 1992, Using the TORQ database, New York University, http://people.stern.nyu.edu...

Exercises

Which module contains the function called rolling_kurt? How can you use the function?
Based on daily data downloaded from Yahoo! Finance, find whether Wal-Mart's daily returns follow a normal distribution.
Based on daily returns in 2016, are the mean returns for IBM and DELL the same?
Tip
You can use Yahoo! Finance as your source of data
How many dividends distributed or stock splits happened over the past 10 years for IBM and DELL based on the historical data?
Write a Python program to estimate rolling beta on a 3-year window for a few stocks such as IBM, WMT, C and MSFT.
Assume that we just downloaded the prime rate from the Federal Banks' data library from: http://www.federalreserve.gov/releases/h15/data.htm. We downloaded the time-series for Financial 1-month business day. Write a Python program to merge them using:
- Go to the web page: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
- Click on Fama-French Factor, and download their monthly factors named F-F_Research_Data_Factors...

Summary

In this chapter, many concepts and issues associated with time-series are discussed in detail. Topics include how to design a true date variable, how to merge datasets with different frequencies, how to download historical prices from Yahoo! Finance; also, different ways to estimate returns, estimate the Roll (1984) spread, Amihud's (2002) illiquidity, Pastor and Stambaugh's (2003) liquidity, and how to retrieve high-frequency data from Prof. Hasbrouck's TORQ database (Trade, Oder, Report and Quotation). In addition, two datasets from CRSP are shown. Since this book is focusing on open and publicly available finance, economics, and accounting data, we could mention a few financial databases superficially.

In the next chapter, we discuss many concepts and theories related to portfolio theory such as how to measure portfolio risk, how to estimate the risk of 2-stock and n-stock portfolio, the trade-off between risk and return by using various measures of Sharpe ratio, Treynor ratio...