By looking at the instruments individually, they might seem nonautocorrelated and unpredictable in mean, as indicated by the Efficient Market Hypothesis, however, correlation among them is certainly present. This might be exploited by trading activity, either for speculation or for hedging purposes. These considerations justify the use of multivariate time series techniques in quantitative finance. In this chapter, we will discuss two prominent econometric concepts with numerous applications in finance. They are cointegration and vector autoregression models.
From now on, we will consider a vector of time series , which consists of the elements each of them individually representing a time series, for instance, the price evolution of different financial products. Let's begin with the formal definition of cointegrating data series.
The vector of time series is said to be cointegrated if each of the series are individually integrated in the order (in particular, in most of the applications the series are integrated of order 1, which means nonstationary unitroot processes, or random walks), while there exists a linear combination of the series , which is integrated in the order (typically, it is of order 0, which is a stationary process).
Intuitively, this definition implies the existence of some underlying forces in the economy that are keeping together the n time series in the long run, even if they all seem to be individually random walks. A simple example for cointegrating time series is the following pair of vectors, taken from Hamilton (1994), which we will use to study cointegration, and at the same time, familiarize ourselves with some basic simulation techniques in R:
The unit root in will be shown formally by standard statistical tests. Unit root tests in R can be performed using either the tseries
package or the urca
package; here, we use the second one. The following R code simulates the two series of length 1000
:
Tip
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
The output of the preceding code is as follows:
By visual inspection, both series seem to be individually random walks. Stationarity can be tested by the Augmented Dickey Fuller test, using the urca
package; however, many other tests are also available in R. The null hypothesis states that there is a unit root in the process (outputs omitted); we reject the null if the test statistic is smaller than the critical value:
For both of the simulated series, the test statistic is larger than the critical value at the usual significance levels (1 percent, 5 percent, and 10 percent); therefore, we cannot reject the null hypothesis, and we conclude that both the series are individually unit root processes.
Now, take the following linear combination of the two series and plot the resulted series:
The output for the preceding code is as follows:
clearly seems to be a white noise process; the rejection of the unit root is confirmed by the results of ADF tests:
In a realworld application, obviously we don't know the value of ; this has to be estimated based on the raw data, by running a linear regression of one series on the other. This is known as the EngleGranger method of testing cointegration. The following two steps are known as the EngleGranger method of testing cointegration:

Run a linear regression on (a simple OLS estimation).
Test the residuals for the presence of a unit root.
Simple linear regressions can be fitted by using the lm
function. The residuals can be obtained from the resulting object as shown in the following example. The ADF test is run in the usual way and confirms the rejection of the null hypothesis at all significant levels. Some caveats, however, will be discussed later in the chapter:
Now, consider how we could turn this theory into a successful trading strategy. At this point, we should invoke the concept of statistical arbitrage or pair trading, which, in its simplest and early form, exploits exactly this cointegrating relationship. These approaches primarily aim to set up a trading strategy based on the spread between two time series; if the series are cointegrated, we expect their stationary linear combination to revert to 0. We can make profit simply by selling the relatively expensive one and buying the cheaper one, and just sit and wait for the reversion.
Tip
The term statistical arbitrage, in general, is used for many sophisticated statistical and econometrical techniques, and this aims to exploit relative mispricing of assets in statistical terms, that is, not in comparison to a theoretical equilibrium model.
What is the economic intuition behind this idea? The linear combination of time series that forms the cointegrating relationship is determined by underlying economic forces, which are not explicitly identified in our statistical model, and are sometimes referred to as longterm relationships between the variables in question. For example, similar companies in the same industry are expected to grow similarly, the spot and forward price of a financial product are bound together by the noarbitrage principle, FX rates of countries that are somehow interlinked are expected to move together, or shortterm and longterm interest rates tend to be close to each other. Deviances from this statistically or theoretically expected comovements open the door to various quantitative trading strategies where traders speculate on future corrections.
The concept of cointegration is further discussed in a later chapter, but for that, we need to introduce vector autoregressive models.
Vector autoregressive models
Vector autoregressive models (VAR) can be considered as obvious multivariate extensions of the univariate autoregressive (AR) models. Their popularity in applied econometrics goes back to the seminal paper of Sims (1980). VAR models are the most important multivariate time series models with numerous applications in econometrics and finance. The R package vars provide an excellent framework for R users. For a detailed review of this package, we refer to Pfaff (2013). For econometric theory, consult Hamilton (1994), Lütkepohl (2007), Tsay (2010), or Martin et al. (2013). In this book, we only provide a concise, intuitive summary of the topic.
In a VAR model, our point of departure is a vector of time series of length . The VAR model specifies the evolution of each variable as a linear function of the lagged values of all other variables; that is, a VAR model of the order p is the following:
Here, are the coefficient matrices for all , and is a vector white noise process with a positive definite covariance matrix. The terminology of vector white noise assumes lack of autocorrelation, but allows contemporaneous correlation between the components; that is, has a nondiagonal covariance matrix.
The matrix notation makes clear one particular feature of VAR models: all variables depend only on past values of themselves and other variables, meaning that contemporaneous dependencies are not explicitly modeled. This feature allows us to estimate the model by ordinary least squares, applied equationbyequation. Such models are called reduced form VAR models, as opposed to structural form models, discussed in the next section.
Obviously, assuming that there are no contemporaneous effects would be an oversimplification, and the resulting impulseresponse relationships, that is, changes in the processes followed by a shock hitting a particular variable, would be misleading and not particularly useful. This motivates the introduction of structured VAR (SVAR) models, which explicitly models the contemporaneous effects among variables:
Here, and ; thus, the structural form can be obtained from the reduced form by multiplying it with an appropriate parameter matrix , which reflects the contemporaneous, structural relations among the variables.
Tip
In the notation, as usual, we follow the technical documentation of the vars package, which is very similar to that of Lütkepohl (2007).
In the reduced form model, contemporaneous dependencies are not modeled; therefore, such dependencies appear in the correlation structure of the error term, that is, the covariance matrix of , denoted by . In the SVAR model, contemporaneous dependencies are explicitly modelled (by the A matrix on the lefthand side), and the disturbance terms are defined to be uncorrelated, so the covariance matrix is diagonal. Here, the disturbances are usually referred to as structural shocks.
What makes the SVAR modeling interesting and difficult at the same time is the socalled identification problem; the SVAR model is not identified, that is, parameters in matrix A cannot be estimated without additional restrictions.
Tip
How should we understand that a model is not identified? This basically means that there exist different (infinitely many) parameter matrices, leading to the same sample distribution; therefore, it is not possible to identify a unique value of parameters based on the sample.
Given a reduced form model, it is always possible to derive an appropriate parameter matrix, which makes the residuals orthogonal; the covariance matrix is positive semidefinitive, which allows us to apply the LDL decomposition (or alternatively, the Cholesky decomposition). This states that there always exists an lower triangle matrix and a diagonal matrix such that . By choosing , the covariance matrix of the structural model becomes , which gives . Now, we conclude that is a diagonal, as we intended. Note that by this approach, we essentially imposed an arbitrary recursive structure on our equations. This is the method followed by the irf()
function by default.
There are multiple ways in the literature to identify SVAR model parameters, which include shortrun or longrun parameter restrictions, or sign restrictions on impulse responses (see, for example, FryPagan (2011)). Many of them have no native support in R yet. Here, we only introduce a standard set of techniques to impose shortrun parameter restrictions, which are respectively called Amodel, Bmodel, and ABmodel, each of which are supported natively by package vars
:

In the case of an Amodel, , and restrictions on matrix A are imposed such that is a diagonal covariance matrix. To make the model "just identified", we need additional restrictions. This is reminiscent of imposing a triangle matrix (but that particular structure is not required).

Alternatively, it is possible to identify the structural innovations based on the restricted model residuals by imposing a structure on the matrix B (Bmodel), that is, directly on the correlation structure, in this case, and .

The ABmodel places restrictions on both A and B, and the connection between the restricted and structural model is determined by .
Impulseresponse analysis is usually one of the main goals of building a VAR model. Essentially, an impulseresponse function shows how a variable reacts (response) to a shock (impulse) hitting any other variable in the system. If the system consists of variables, impulse response functions can be determined. Impulse responses can be derived mathematically from the Vector Moving Average representation (VMA) of the VAR process, similar to the univariate case (see the details in Lütkepohl (2007)).
VAR implementation example
As an illustrative example, we build a threecomponent VAR model from the following components:
Equity return: This specifies the Microsoft price index from January 01, 2004 to March 03, 2014
Stock index: This specifies the S&P500 index from January 01, 2004 to March 03, 2014
US Treasury bond interest rates from January 01, 2004 to March 03, 2014
Our primary purpose is to make a forecast for the stock market index by using the additional variables and to identify impulse responses. Here, we suppose that there exists a hidden long term relationship between a given stock, the stock market as a whole, and the bond market. The example was chosen primarily to demonstrate several of the data manipulation possibilities of the R programming environment and to illustrate an elaborate concept using a very simple example, and not because of its economic meaning.
We use the vars
and quantmod
packages. Do not forget to install and load those packages if you haven't done this yet:
The Quantmod
package offers a great variety of tools to obtain financial data directly from online sources, which we will frequently rely on throughout the book. We use the getSymbols()
function:
By default, yahoofinance
is used as a data source for equity and index price series (src='yahoo'
parameter settings, which are omitted in the example). The routine downloads open, high, low, close prices, trading volume, and adjusted prices. The downloaded data is stored in an xts
data class, which is automatically named by default after the ticker (MSFT and SNP). It's possible to plot the closing prices by calling the generic plot
function, but the chartSeries
function of quantmod
provides a much better graphical illustration.
The components of the downloaded data can be reached by using the following shortcuts:
Thus, for example, by using these shortcuts, the daily closetoclose returns can be plotted as follows:
The screenshot for the preceding command is as follows:
Interest rates are downloaded from the FRED (Federal Reserve Economic Data) data source. The current version of the interface does not allow subsetting of dates; however, downloaded data is stored in an xts
data class, which is straightforward to subset to obtain our period of interest:
The downloaded prices (which are supposed to be nonstationary series) should be transformed into a stationary series for analysis; that is, we will work with log returns, calculated from the adjusted series:
To proceed, we need a last datacleansing step before turning to VAR model fitting. By eyeballing the data, we can see that missing data exists in TBill return series, and the lengths of our databases are not the same (on some dates, there are interest rate quotes, but equity prices are missing). To solve these dataquality problems, we choose, for now, the easiest possible solution: merge the databases (by omitting all data points for which we do not have all three data), and omit all NA data. The former is performed by the inner join parameter (see help of the merge function for details):
Here, we note that VAR modeling is usually done on lower frequency data. There is a simple way of transforming your data to monthly or quarterly frequencies, by using the following functions, which return with the opening, highest, lowest, and closing value within the given period:
A simple reduced VAR model may be fitted to the data by using the VAR()
function of the vars
package. The parameterization shown in the following code allows a maximum of 4 lags in the equations, and choose the model with the best (lowest) Akaike Information Criterion value:
For a more established model selection, you can consider using VARselect()
, which provides multiple information criteria (output omitted):
The resulting object is an object of the varest
class. Estimated parameters and multiple other statistical results can be obtained by the summary()
method or the show()
method (that is, by just typing the variable):
There are other methods worth mentioning. The custom plotting method for the varest
class generates a diagram for all variables separately, including its fitted values, residuals, and autocorrelation and partial autocorrelation functions of the residuals. You need to hit Enter to get the new variable. Plenty of custom settings are available; please consult the vars
package documentation:
Predictions using our estimated VAR model can be made by simply calling the predict
function and by adding a desired confidence interval:
Impulse responses should be first generated numerically by irf()
, and then they can be plotted by the plot()
method. Again, we get different diagrams for each variable, including the respective impulse response functions with bootstrapped confidence intervals as shown in the following command:
Now, consider fitting a structural VAR model using parameter restrictions described earlier as an Amodel. The number of required restrictions for the SVAR model that is identified is ; in our case, this is 3.
Tip
See Lütkepohl (2007) for more details. The number of additional restrictions required is , but the diagonal elements are normalized to unity, which leaves us with the preceding number.
The point of departure for an SVAR model is the already estimated reduced form of the VAR model (var1). This has to be amended with an appropriately structured restriction matrix.
For the sake of simplicity, we will use the following restrictions:
S&P index shocks do not have a contemporaneous effect on Microsoft
S&P index shocks do not have a contemporaneous effect on interest rates
TBonds interest rate shocks have no contemporaneous effect on Microsoft
These restrictions enter into the SVAR model as 0s in the A matrix, which is as follows:
When setting up the A matrix as a parameter for SVAR estimation in R, the positions of the tobe estimated parameters should take the NA value. This can be done with the following assignments:
Finally, we can fit the SVAR model and plot the impulse response functions (the output is omitted):
Cointegrated VAR and VECM
Finally, we put together what we have learned so far, and discuss the concepts of Cointegrated VAR and Vector Error Correction Models (VECM).
Our starting point is a system of cointegrated variables (for example, in a trading context, this indicates a set of similar stocks that are likely to be driven by the same fundamentals). The standard VAR models discussed earlier can only be estimated when the variables are stationary. As we know, the conventional way to remove unit root model is to first differentiate the series; however, in the case of cointegrated series, this would lead to overdifferencing and losing information conveyed by the longterm comovement of variable levels. Ultimately, our goal is to build up a model of stationary variables, which also incorporates the longterm relationship between the original cointegrating nonstationary variables, that is, to build a cointegrated VAR model. This idea is captured by the Vector Error Correction Model (VECM), which consists of a VAR model of the order p  1 on the differences of the variables, and an errorcorrection term derived from the known (estimated) cointegrating relationship. Intuitively, and using the stock market example, a VECM model establishes a shortterm relationship between the stock returns, while correcting with the deviation from the longterm comovement of prices.
Formally, a twovariable VECM, which we will discuss as a numerical example, can be written as follows. Let be a vector of two nonstationary unit root series where the two series are cointegrated with a cointegrating vector . Then, an appropriate VECM model can be formulated as follows:
Here, and the first term are usually called the error correction terms.
In practice, there are two approaches to test cointegration and build the error correction model. For the twovariable case, the EngleGranger method is quite instructive; our numerical example basically follows that idea. For the multivariate case, where the maximum number of possible cointegrating relationships is , you have to follow the Johansen procedure. Although the theoretical framework for the latter goes far beyond the scope of this book, we briefly demonstrate the tools for practical implementation and give references for further studies.
To demonstrate some basic R capabilities regarding VECM models, we will use a standard example of three months and six months TBill secondary market rates, which can be downloaded from the FRED database, just as we discussed earlier. We will restrict our attention to an arbitrarily chosen period, that is, from 1984 to 2014. Augmented Dickey Fuller tests indicate that the null hypothesis of the unit root cannot be rejected.
We can consistently estimate the cointegrating relationship between the two series by running a simple linear regression. To simplify coding, we define the variables x1
and x2
for the two series, and y
for the respective vector series. The other variablenaming conventions in the code snippets will be selfexplanatory:
The two series are indeed cointegrated if the residuals of the regression (variable r
), that is, the appropriate linear combination of the variables, constitute a stationary series. You could test this with the usual ADF test, but in these settings, the conventional critical values are not appropriate, and corrected values should be used (see, for example Phillips and Ouliaris (1990)).
It is therefore much more appropriate to use a designated test for the existence of cointegration, for example, the Phillips and Ouliaris test, which is implemented in the tseries
and in the urca
packages as well. The most basic tseries
version is demonstrated as follows:
The null hypothesis states that the two series are not cointegrated, so the low p value indicates rejection of null and presence of cointegration.
The Johansen procedure is applicable for more than one possible cointegrating relationship; an implementation can be found in the urca
package:
The test statistic for r = 0 (no cointegrating relationship) is larger than the critical values, which indicates the rejection of the null. For , however, the null cannot be rejected; therefore, we conclude that one cointegrating relationship exists. The cointegrating vector is given by the first column of the normalized eigenvectors below the test results.
The final step is to obtain the VECM representation of this system, that is, to run an OLS regression on the lagged differenced variables and the error correction term derived from the previously calculated cointegrating relationship. The appropriate function utilizes the ca.jo
object class, which we created earlier. The r = 1 parameter signifies the cointegration rank which is as follows:
The coefficient of the errorcorrection term is negative, as we expected; a shortterm deviation from the longterm equilibrium level would push our variables back to the zero equilibrium deviation.
You can easily check this in the bivariate case; the result of the Johansen procedure method leads to approximately the same result as the stepbystep implementation of the ECM following the EngleGranger procedure. This is shown in the uploaded R code files.