Chapter 8. Time-Series Analysis
In finance and economics, a huge amount of our data is in the format of time-series, such as stock prices and Gross Domestic Products (GDP). From Chapter 4, Sources of Data, it is shown that from Yahoo!Finance, we could download daily, weekly, and monthly historical price time-series. From Federal Reserve Bank's Economics Data Library (FRED), we could retrieve many historical time-series such as GDP. For time-series, there exist many issues, such as how to estimate returns from historical price data, how to merge datasets with the same or different frequencies, seasonality, and detect auto-correlation. Understanding those properties is vitally important for our knowledge development.
In this chapter, the following topics will be covered:
Introduction to time-series analysis
Design a good date variable, and merging different datasets by date
Normal distribution and normality test
Term structure of interest rates, 52-week high, and low trading strategy
Return estimation...
Introduction to time-series analysis
Most finance data is in the format of time-series, see the following several examples. The first one shows how to download historical, daily stock price data from Yahoo!Finance for a given ticker's beginning and ending dates:
The output is shown here:
The type of the data is numpy.recarray
as the type(x)
would show. The second example prints the first several observations from two datasets called ffMonthly.pkl
and usGDPquarterly.pkl
, and both are available from the author's website, such as http://canisius.edu/~yany/python/ffMonthly.pkl:
The related output is shown here:
There is one end of chapter problem which is designed to merge discrete data with the daily...
Merging datasets based on a date variable
To make our time-series more manageable, it is a great idea to generate a date
variable. When talking about such a variable, readers could think about year (YYYY), year and month (YYYYMM) or year, month, and day (YYYYMMDD). For just the year, month, and day combination, we could have many forms. Using January 20, 2017 as an example, we could have 2017-1-20, 1/20/2017, 20Jan2017, 20-1-2017, and the like. In a sense, a true date variable, in our mind, could be easily manipulated. Usually, the true date
variable takes a form of year-month-day or other forms of its variants. Assume the date variable has a value of 2000-12-31. After adding one day to its value, the result should be 2001-1-1.
Using pandas.date_range() to generate one dimensional time-series
We could easily use the pandas.date_range()
function to generate our time-series; refer to the following example:
Understanding the interpolation technique
Interpolation is a technique used quite frequently in finance. In the following example, we have to replace two missing values, NaN
, between 2 and 6. The pandas.interpolate()
function, for a linear interpolation, is used to fill in the two missing values:
The output is shown here:
The preceding method is a linear interpolation. Actually, we could estimate a Δ and calculate those missing values manually:
Here, v2(v1) is the second (first) value and n is the number of intervals between those two values. For the preceding case, Δ is (6-2)/3=1.33333. Thus, the next value will be v1+Δ=2+1.33333=3.33333. This way, we could continually estimate all missing values. Note that if we have several periods with missing values, then the delta for each period has to be calculated manually...
In finance, knowledge about normal distribution is very important for two reasons. First, stock returns are assumed to follow a normal distribution. Second, the error terms from a good econometric model should follow a normal distribution with a zero mean. However, in the real world, this might not be true for stocks. On the other hand, whether stocks or portfolios follow a normal distribution could be tested by various so-called normality tests. The Shapiro-Wilk test is one of them. For the first example, random numbers are drawn from a normal distribution. As a consequence, the test should confirm that those observations follow a normal distribution:
Assume that our confidence level is 95%, that is, alpha=0.05. The first value...
52-week high and low trading strategy
Some investors/researchers argue that we could adopt a 52-week high and low trading strategy by taking a long position if today's price is close to the maximum price achieved in the past 52 weeks and taking an opposite position if today's price is close to its 52-week low. Let's randomly choose a day of 12/31/2016. The following Python program presents this 52-week's range and today's position:
The corresponding output is shown as follows:
According to the 52-week...
Liquidity is defined as how quickly we can dispose of our asset without losing its intrinsic value. Usually, we use spread to represent liquidity. However, we need high-frequency data to estimate spread. Later in the chapter, we show how to estimate spread directly by using high-frequency data. To measure spread indirectly based on daily observations, Roll (1984) shows that we can estimate it based on the serial covariance in price changes, as follows:
Here, S is the Roll spread, Pt is the closing price of a stock on day,
is Pt-Pt-1, and
, t is the average share price in the estimation period. The following Python code estimates Roll's spread for IBM, using one year's daily price data from Yahoo! Finance:
Estimating Amihud's illiquidity
According to Amihud (2002), liquidity reflects the impact of order flow on price. His illiquidity measure is defined as follows:
Here, illiq(t) is the Amihud's illiquidity measure for month t, Ri is the daily return at day i, Pi is the closing price at i, and Vi is the daily dollar trading volume at i. Since the illiquidity is the reciprocal of liquidity, the lower the illiquidity value, the higher the liquidity of the underlying security. First, let's look at an item-by-item division:
In the following code, we estimate Amihud's illiquidity for IBM based on trading data in October 2013. The value is 1.21*10-11. It seems that this value is quite small. Actually, the absolute value is not important; the relative value matters. If we estimate the illiquidity for WMT over the same period, we would find a...
Estimating Pastor and Stambaugh (2003) liquidity measure
Based on the methodology and empirical evidence in Campbell, Grossman, and Wang (1993), Pastor and Stambaugh (2003) designed the following model to measure individual stock's liquidity and the market liquidity:
Here, yt is the excess stock return, Rt-Rf , t, on day t, Rt is the return for the stock, Rf,t is the risk-free rate, x1,t is the market return, and x2,t is the signed dollar trading volume:
pt is the stock price, and volume, t is the trading volume. The regression is run based on daily data for each month. In other words, for each month, we get one β2 that is defined as the liquidity measure for individual stock. The following code estimates the liquidity for IBM. First, we download the IBM and S&P500 daily price data, estimate their daily returns, and merge them as follows:
First, let's look at the OLS regression by using the pandas.ols
function as follows:
For the Fama-MacBeth regression, we have the following code:
Durbin-Watson statistic is related auto-correlation. After we run a regression, the error term should have no correlation, with a mean zero. Durbin-Watson statistic is defined as:
Here, et is the error term at time t, T is the total number of error term. The Durbin-Watson statistic tests the null hypothesis that the residuals from an ordinary least-squares regression are not auto-correlated against the alternative that the residuals follow an AR1 process. The Durbin-Watson statistic ranges in value from 0 to 4. A value near 2 indicates non-autocorrelation; a value toward 0 indicates positive autocorrelation; a value toward 4 indicates negative autocorrelation, see the following table:
Table 8.3 Durbin-Watson Test
The following Python program runs a CAPM first by using daily data for IBM. The S&P500 is used as the index. The time period is from...
Python for high-frequency data
High-frequency data is referred to as second-by-second or millisecond-by-millisecond transaction and quotation data. The New York Stock Exchange's Trade and Quotation (TAQ) database is a typical example (http://www.nyxdata.com/data-products/daily-taq). The following program can be used to retrieve high-frequency data from Google Finance:
In the preceding program, we have two input variables: ticker...
Spread estimated based on high-frequency data
Based on the Consolidated Quote (CQ) dataset supplied by Prof. Hasbrouck, we generate a dataset with the pickle format of pandas, that can be downloaded from http://canisius.edu/~yany/python/TORQcq.pkl. Assume that the following data is located under C:\temp
:
The output is shown here:
For this book, our focus is free public data. Thus, we only discuss a few financial databases since some readers might from schools with valid subscription. CRSP is the one. In this chapter, we mention just three Python datasets.
Center for Research in Security Prices (CRSP). It contains all trading data, such as closing price, trading volume, and shares outstanding for all listed stocks in the US from 1926 onward. Because of its quality and long history, it has been used intensively by academic researchers and practitioners. The first dataset is called crspInfo.pkl
, see the following code:
The related output is shown here:
Please refer to the following articles:
Amihud and Yakov, 2002, Illiquidity and stock returns: cross-section and time-series effects, Journal of Financial Markets, 5, 31–56, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.145.9505&rep=rep1&type=pdf
Bali, T. G., Cakici, N., and Whitelaw, R. F., 2011, Maxing out: Stocks as lotteries and the cross-section of expected returns, Journal of Financial Economics, 99(2), 427–446 http://www.sciencedirect.com/science/article/pii/S0304405X1000190X
Cook Pine Capital LLC, November 26, 2008, Study of Fat-tail Risk, http://www.cookpinecapital.com/pdf/Study%20of%20Fat-tail%20Risk.pdf
CRSP web site, http://crsp.com/
CRSP user manual, http://www.crsp.com/documentation
George, T.J., and Hwang, C., 2004, The 52-Week High and Momentum Investing, Journal of Finance 54(5), 2145–2176, http://www.bauer.uh.edu/tgeorge/papers/gh4-paper.pdf
Hasbrouck, Joel, 1992, Using the TORQ database, New York University, http://people.stern.nyu.edu...
Which module contains the function called rolling_kurt? How can you use the function?
Based on daily data downloaded from Yahoo! Finance, find whether Wal-Mart's daily returns follow a normal distribution.
Based on daily returns in 2016, are the mean returns for IBM and DELL the same?
Tip
You can use Yahoo! Finance as your source of data
How many dividends distributed or stock splits happened over the past 10 years for IBM and DELL based on the historical data?
Write a Python program to estimate rolling beta on a 3-year window for a few stocks such as IBM, WMT, C and MSFT.
Assume that we just downloaded the prime rate from the Federal Banks' data library from: http://www.federalreserve.gov/releases/h15/data.htm. We downloaded the time-series for Financial 1-month business day. Write a Python program to merge them using:
In this chapter, many concepts and issues associated with time-series are discussed in detail. Topics include how to design a true date variable, how to merge datasets with different frequencies, how to download historical prices from Yahoo! Finance; also, different ways to estimate returns, estimate the Roll (1984) spread, Amihud's (2002) illiquidity, Pastor and Stambaugh's (2003) liquidity, and how to retrieve high-frequency data from Prof. Hasbrouck's TORQ database (Trade, Oder, Report and Quotation). In addition, two datasets from CRSP are shown. Since this book is focusing on open and publicly available finance, economics, and accounting data, we could mention a few financial databases superficially.
In the next chapter, we discuss many concepts and theories related to portfolio theory such as how to measure portfolio risk, how to estimate the risk of 2-stock and n-stock portfolio, the trade-off between risk and return by using various measures of Sharpe ratio, Treynor ratio...