Reader small image

You're reading from  Learning Quantitative Finance with R

Product typeBook
Published inMar 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786462411
Edition1st Edition
Languages
Right arrow
Authors (2):
Dr. Param Jeet
Dr. Param Jeet
author image
Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

PRASHANT VATS
PRASHANT VATS
author image
PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS

View More author details
Right arrow

Chapter 3. Econometric and Wavelet Analysis

In financial analytics, we need techniques to do predictive modeling for forecasting and finding the drivers for different target variables. In this chapter, we will discuss types of regression and how we can build a regression model in R for building predictive models. Also we will discuss, how we can implement a variable selection method and other aspects associated with regression. This chapter will not contain theoretical description but will just guide you in how to implement a regression model in R in the financial space. Regression analysis can be used for doing forecast on cross-sectional data in the financial domain. We will also cover frequency analysis of the data, and how transformations such as Fast Fourier, wavelet, Hilbert, haar transformations in time, and frequency domains help to remove noise in the data.

This chapter covers the following topics:

  • Simple linear regression

  • Multivariate linear regression

  • Multicollinearity

  • ANOVA

  • Feature...

Simple linear regression


In simple linear regression, we try to predict one variable in terms of a second variable called a predictor variable. The variable we are trying to predict is called the dependent variable and is denoted by y, and the independent variable is denoted by x. In simple linear regression, we assume a linear relationship between the dependent attribute and predictor attribute.

First we need to plot the data to understand the linear relationship between the dependent variable and independent variable. Here our, data consists of two variables:

  • YPrice: Dependent variable

  • XPrice: Predictor variable

In this case, we are trying to predict Yprice in terms of XPrice. StockXprice is the independent variable and StockYprice is the dependent variable. For every element of StockXprice, there is an element of StockYprice, which implies one-to-one mapping between elements of StockXprice and StockYprice.

A few lines of data used for the following analysis are displayed using the following...

Multivariate linear regression


In multiple linear regression, we try to explain the dependent variable in terms of more than one predictor variable. The multiple linear regression equation is given by the following formula:

Here are multiple linear regression parameters and can be obtained by minimizing the sum of squares, which is also known as the OLS method of estimation.

Let us an take an example where we have the dependent variable StockYPrice and we are trying to predict it in terms of independent variables StockX1Price, StockX2Price, StockX3Price, and StockX4Price, which are present in the dataset DataMR.

Now let us fit the multiple regression model and get parameter estimates of multiple regression:

> MultipleR.lm = lm(StockYPrice ~  StockX1Price + StockX2Price +  StockX3Price + StockX4Price,  data=DataMR) 
> summary(MultipleR.lm) 

When we execute the preceding code, it fits the multiple regression model on the data and gives the basic summary of statistics associated...

Multicollinearity


If the predictor variables are correlated then we need to detect multicollinearity and treat it. Recognition of multicollinearity is crucial because two or more variables are correlated, which shows a strong dependence structure between those variables, and we are using correlated variables as independent variables, which end up having a double effect of these variables on the prediction because of the relation between them. If we treat the multicollinearity and consider only variables which are not correlated then we can avoid the problem of double impact.

We can find multicollinearity by executing the following code:

> vif(MultipleR.lm) 

This gives the multicollinearity table for the predictor variables:

Figure 3.8: VIF table for multiple regression model

Depending upon the values of VIF, we can drop the irrelevant variable.

ANOVA


ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent groups. In the case of only two samples, we can use the t-test to compare the means of the samples, but in the case of more than two samples, it may be very complicated. We are going to study the relationship between quantitative dependent variable returns and single qualitative independent variable stock. We have five levels of stock: stock1, stock2, .. stock5.

We can study the five levels of stock by means of a box plot and we can compare by executing the following code:

> DataANOVA = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataAnova.csv") 
>head(DataANOVA) 

This displays a few lines of the data used for analysis in tabular format:

Returns

Stock

1

1.64

Stock1

2

1.72

Stock1

3

1.68

Stock1

4

1.77

Stock1

5

1.56

Stock1

6

1.95

Stock1

>boxplot(DataANOVA$Returns ~ DataANOVA$Stock) 

This gives the...

Feature selection


Feature selection is one of the toughest parts of financial model building. Feature selection can be done statistically or by having domain knowledge. Here we are going to discuss only a few of the statistical feature selection methods in the financial space.

Removing irrelevant features

Data may contain highly correlated features and the model does better if we do not have highly correlated features in the model. The Caret R package gives the method for finding a correlation matrix between the features, which is shown by the following example.

A few lines of data used for correlation analysis and multiple regression analysis are displayed here by executing the following code:

>DataMR = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataForMultipleRegression.csv") 
>head(DataMR) 

Stepwise variable selection


We can use stepwise variable selection (forward, backward, both) in predictive models using the stepAIC() function for feature selection.

This can be done by executing the following code:

> MultipleR.lm = lm(StockYPrice ~  
StockX1Price + StockX2Price + StockX3Price + StockX4Price,  
data=DataMR) 
> step <- stepAIC(MultipleR.lm, direction="both") 
> step$anova  

Here, we are using the dataset used for multiple regression as the input dataset. One can also use all-subsets regression using the leaps() function from the leaps package.

Variable selection by classification

We can use classification techniques such as decision tree or random forest to get the most significant predictors. Here we are using random forest (code is given) to find the most relevant features. All the four attributes in the dataset DataForMultipleRegression1 have been selected in the following example and the plot shows the accuracy of different subset sizes...

Ranking of variables


After fitting a regression/predictive model, we need to understand what the relative ranking of significant attributes is on a comparative scale. This is explained by Beta parameter estimates. Beta, or standardized coefficients, are the slopes we get if all the variables are on the same scale, which is done by converting them to z-scores before doing the predictive modeling (regression). Beta coefficients allow a comparison of the approximate relative importance of the predictors and hence the variables can be ranked, which neither the unstandardized coefficients nor the Pvalues can. Scaling, or standardizing, the data vectors can be done using the scale() function. Once the scaled variables are created, the regression is redone using them. The resulting coefficients are the beta coefficients.

Wavelet analysis


Time series information is not always sufficient to get insight into the data. Sometimes the frequency content of the data also contains important information about the data. In the time domain, Fourier transformation (FT) captures the frequency-amplitude of the data but it does not show when in time this frequency has happened. In the case of stationary data, all frequency components exist at any point in time but this is not true for non-stationary data. So, FT does not fit for non-stationary data. Wavelet transformation (WT) has the capacity to provide time and frequency information simultaneously in the form of time-frequency. WT is important to analyze financial time series as most of the financial time series are non-stationary. In the remainder of this chapter, wavelet analysis (WT), I will help you understand how to solve non-stationary data in R using wavelets analysis. Stock price/index data requires certain techniques or transformations to obtain further information...

Fast Fourier transformation


Fast Fourier transformation (FFT) is used for calculating the Fourier transform of discrete time series. You need to install the relevant package fft for FFT with the help of the following code:

install.packages('fft') 

Once you install the package, you have to load this into the workspace by using the following code:

library(fft) 

Fast Fourier transform of time series can be calculated using fft, and it accepts real or complex numbers series.

In the following example, dji is a real number time series:

> model<- fft(dji)  

The variable model is a transformed series which basically consists of complex numbers, and the real and imaginary parts can be extracted using the following code:

>rp = Re(model) 
>ip = Im(model) 

The following command calculates the absolute value of the model:

>absmodel<- abs(model) 

Let me plot this and see what information the absolute value of fft has for me:

>plot(absmodel) 

Figure 3...

Hilbert transformation


Hilbert transformation is another technique to transform time series and R uses the seewave package for this. This package can be installed using install.packages() and loaded into the workspace using the library() command:

> model <-  hilbert(dji, 1) 

The first parameter is the time series object which you would like to transform, and the second parameter is the sampling frequency of the wave. In the preceding example, I used dji as time series and sampling frequency as 1 to calculate the Hilbert transformation.

If you would like to know the output of the model then you should use the following code:

> summary(model) 
      V1          
 Length:2555 
 Class :complex 
 Mode  :complex 

The preceding output mentions the length of input data series is 2555 and the type of output variable named model is complex.

As the output is complex, we can extract real and imaginary values using the following code:

>rp<- Re(model)   
>...

Questions


  1. Define regression and how you can implement in R.

  2. How do you find the coefficient of determination for linear regression / multiple regression in R?

  3. How do you find the confidence interval for a prediction fitted with linear regression / multiple regression in R?

  4. How will you detect multicollinearity in R in multiple regression?

  5. What is the significance of ANOVA and how will you use it to compare the results of two linear regression models?

  6. How do you perform feature selection in R for multiple linear regression?

  7. How do you rank significance attributes in a multiple linear regression model in R?

  8. How do you install the waveslim package and load it into the R workspace?

  9. How do you plot a time series and extract the head and tail of the time series?

  10. How would you know the class of a variable created by the fft function?

  11. How do you use the dwt function using any given filter and take inverse dwt?

  12. How do you extract the real and imaginary parts of a series?

  13. How would you use fast Fourier transformation...

Summary


Regression is the backbone of any analysis and the reader cannot go ahead without touching on it. In this chapter, I have presented linear regression and multivariate regression and how they are used for prediction. The R function lm() is used to implement both simple and multivariate linear regression. I also presented significance testing along with residual calculations and the normality plot, which tests residuals for normality using a qq plot. Analysis of variance (ANOVA) is used to select the difference means of two or more samples. Multivariate linear regression involves many variables, and the coefficient of each variable is different, which varies the importance of each variable and is ranked accordingly. Stepwise regression is used to select variables which are important in the regression. Time series analysis does not represent the complete information sometimes. It becomes necessary to explore frequency analysis, which can be done with wavelet, fast Fourier and Hilbert...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Quantitative Finance with R
Published in: Mar 2017Publisher: PacktISBN-13: 9781786462411
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

author image
PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS

StockYPrice

StockX1Price

StockX2Price

StockX3Price

StockX4Price

1

80.13

72.86

93.1

63.7

83.1

2

79.57

72.88

90.2

63.5

82

3

79.93

71.72

99

64...