Packt+ | Advance your knowledge in tech

You're reading from Learning Quantitative Finance with R

Product typeBook

Published inMar 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781786462411

Edition1st Edition

Languages

Concepts

Financial Technology

Authors (2):

Dr. Param Jeet

PRASHANT VATS

View More author details

Chapter 3. Econometric and Wavelet Analysis

In financial analytics, we need techniques to do predictive modeling for forecasting and finding the drivers for different target variables. In this chapter, we will discuss types of regression and how we can build a regression model in R for building predictive models. Also we will discuss, how we can implement a variable selection method and other aspects associated with regression. This chapter will not contain theoretical description but will just guide you in how to implement a regression model in R in the financial space. Regression analysis can be used for doing forecast on cross-sectional data in the financial domain. We will also cover frequency analysis of the data, and how transformations such as Fast Fourier, wavelet, Hilbert, haar transformations in time, and frequency domains help to remove noise in the data.

This chapter covers the following topics:

Simple linear regression
Multivariate linear regression
Multicollinearity
ANOVA
Feature...

Simple linear regression

In simple linear regression, we try to predict one variable in terms of a second variable called a predictor variable. The variable we are trying to predict is called the dependent variable and is denoted by y, and the independent variable is denoted by x. In simple linear regression, we assume a linear relationship between the dependent attribute and predictor attribute.

First we need to plot the data to understand the linear relationship between the dependent variable and independent variable. Here our, data consists of two variables:

YPrice: Dependent variable
XPrice: Predictor variable

In this case, we are trying to predict Yprice in terms of XPrice. StockXprice is the independent variable and StockYprice is the dependent variable. For every element of StockXprice, there is an element of StockYprice, which implies one-to-one mapping between elements of StockXprice and StockYprice.

A few lines of data used for the following analysis are displayed using the following...

Multivariate linear regression

In multiple linear regression, we try to explain the dependent variable in terms of more than one predictor variable. The multiple linear regression equation is given by the following formula:

Here are multiple linear regression parameters and can be obtained by minimizing the sum of squares, which is also known as the OLS method of estimation.

Let us an take an example where we have the dependent variable StockYPrice and we are trying to predict it in terms of independent variables StockX1Price, StockX2Price, StockX3Price, and StockX4Price, which are present in the dataset DataMR.

Now let us fit the multiple regression model and get parameter estimates of multiple regression:

> MultipleR.lm = lm(StockYPrice ~  StockX1Price + StockX2Price +  StockX3Price + StockX4Price,  data=DataMR) 
> summary(MultipleR.lm)

When we execute the preceding code, it fits the multiple regression model on the data and gives the basic summary of statistics associated...

Multicollinearity

If the predictor variables are correlated then we need to detect multicollinearity and treat it. Recognition of multicollinearity is crucial because two or more variables are correlated, which shows a strong dependence structure between those variables, and we are using correlated variables as independent variables, which end up having a double effect of these variables on the prediction because of the relation between them. If we treat the multicollinearity and consider only variables which are not correlated then we can avoid the problem of double impact.

We can find multicollinearity by executing the following code:

> vif(MultipleR.lm)

This gives the multicollinearity table for the predictor variables:

Figure 3.8: VIF table for multiple regression model

Depending upon the values of VIF, we can drop the irrelevant variable.

ANOVA

ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent groups. In the case of only two samples, we can use the t-test to compare the means of the samples, but in the case of more than two samples, it may be very complicated. We are going to study the relationship between quantitative dependent variable returns and single qualitative independent variable stock. We have five levels of stock: stock1, stock2, .. stock5.

We can study the five levels of stock by means of a box plot and we can compare by executing the following code:

> DataANOVA = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataAnova.csv") 
>head(DataANOVA)

This displays a few lines of the data used for analysis in tabular format:

	`Returns`	`Stock`
1	1.64	Stock1
2	1.72	Stock1
3	1.68	Stock1
4	1.77	Stock1
5	1.56	Stock1
6	1.95	Stock1

>boxplot(DataANOVA$Returns ~ DataANOVA$Stock)

This gives the...

Feature selection

Feature selection is one of the toughest parts of financial model building. Feature selection can be done statistically or by having domain knowledge. Here we are going to discuss only a few of the statistical feature selection methods in the financial space.

Removing irrelevant features

Data may contain highly correlated features and the model does better if we do not have highly correlated features in the model. The Caret R package gives the method for finding a correlation matrix between the features, which is shown by the following example.

A few lines of data used for correlation analysis and multiple regression analysis are displayed here by executing the following code:

>DataMR = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataForMultipleRegression.csv") 
>head(DataMR)

Stepwise variable selection

We can use stepwise variable selection (forward, backward, both) in predictive models using the stepAIC() function for feature selection.

This can be done by executing the following code:

> MultipleR.lm = lm(StockYPrice ~  
StockX1Price + StockX2Price + StockX3Price + StockX4Price,  
data=DataMR) 
> step <- stepAIC(MultipleR.lm, direction="both") 
> step$anova

Here, we are using the dataset used for multiple regression as the input dataset. One can also use all-subsets regression using the leaps() function from the leaps package.

Variable selection by classification

We can use classification techniques such as decision tree or random forest to get the most significant predictors. Here we are using random forest (code is given) to find the most relevant features. All the four attributes in the dataset DataForMultipleRegression1 have been selected in the following example and the plot shows the accuracy of different subset sizes...

Ranking of variables

After fitting a regression/predictive model, we need to understand what the relative ranking of significant attributes is on a comparative scale. This is explained by Beta parameter estimates. Beta, or standardized coefficients, are the slopes we get if all the variables are on the same scale, which is done by converting them to z-scores before doing the predictive modeling (regression). Beta coefficients allow a comparison of the approximate relative importance of the predictors and hence the variables can be ranked, which neither the unstandardized coefficients nor the Pvalues can. Scaling, or standardizing, the data vectors can be done using the scale() function. Once the scaled variables are created, the regression is redone using them. The resulting coefficients are the beta coefficients.

Wavelet analysis

Time series information is not always sufficient to get insight into the data. Sometimes the frequency content of the data also contains important information about the data. In the time domain, Fourier transformation (FT) captures the frequency-amplitude of the data but it does not show when in time this frequency has happened. In the case of stationary data, all frequency components exist at any point in time but this is not true for non-stationary data. So, FT does not fit for non-stationary data. Wavelet transformation (WT) has the capacity to provide time and frequency information simultaneously in the form of time-frequency. WT is important to analyze financial time series as most of the financial time series are non-stationary. In the remainder of this chapter, wavelet analysis (WT), I will help you understand how to solve non-stationary data in R using wavelets analysis. Stock price/index data requires certain techniques or transformations to obtain further information...

Fast Fourier transformation

Fast Fourier transformation (FFT) is used for calculating the Fourier transform of discrete time series. You need to install the relevant package fft for FFT with the help of the following code:

install.packages('fft')

Once you install the package, you have to load this into the workspace by using the following code:

library(fft)

Fast Fourier transform of time series can be calculated using fft, and it accepts real or complex numbers series.

In the following example, dji is a real number time series:

> model<- fft(dji)

The variable model is a transformed series which basically consists of complex numbers, and the real and imaginary parts can be extracted using the following code:

>rp = Re(model) 
>ip = Im(model)

The following command calculates the absolute value of the model:

>absmodel<- abs(model)

Let me plot this and see what information the absolute value of fft has for me:

>plot(absmodel)

Figure 3...

Hilbert transformation

Hilbert transformation is another technique to transform time series and R uses the seewave package for this. This package can be installed using install.packages() and loaded into the workspace using the library() command:

> model <-  hilbert(dji, 1)

The first parameter is the time series object which you would like to transform, and the second parameter is the sampling frequency of the wave. In the preceding example, I used dji as time series and sampling frequency as 1 to calculate the Hilbert transformation.

If you would like to know the output of the model then you should use the following code:

> summary(model) 
      V1          
 Length:2555 
 Class :complex 
 Mode  :complex

The preceding output mentions the length of input data series is 2555 and the type of output variable named model is complex.

As the output is complex, we can extract real and imaginary values using the following code:

>rp<- Re(model)   
>...

Questions

Define regression and how you can implement in R.
How do you find the coefficient of determination for linear regression / multiple regression in R?
How do you find the confidence interval for a prediction fitted with linear regression / multiple regression in R?
How will you detect multicollinearity in R in multiple regression?
What is the significance of ANOVA and how will you use it to compare the results of two linear regression models?
How do you perform feature selection in R for multiple linear regression?
How do you rank significance attributes in a multiple linear regression model in R?
How do you install the waveslim package and load it into the R workspace?
How do you plot a time series and extract the head and tail of the time series?
How would you know the class of a variable created by the fft function?
How do you use the dwt function using any given filter and take inverse dwt?
How do you extract the real and imaginary parts of a series?
How would you use fast Fourier transformation...

Summary

Regression is the backbone of any analysis and the reader cannot go ahead without touching on it. In this chapter, I have presented linear regression and multivariate regression and how they are used for prediction. The R function lm() is used to implement both simple and multivariate linear regression. I also presented significance testing along with residual calculations and the normality plot, which tests residuals for normality using a qq plot. Analysis of variance (ANOVA) is used to select the difference means of two or more samples. Multivariate linear regression involves many variables, and the coefficient of each variable is different, which varies the importance of each variable and is ranked accordingly. Stepwise regression is used to select variables which are important in the regression. Time series analysis does not represent the complete information sometimes. It becomes necessary to explore frequency analysis, which can be done with wavelet, fast Fourier and Hilbert...

The rest of the chapter is locked

You have been reading a chapter from

Learning Quantitative Finance with R

Published in: Mar 2017Publisher: PacktISBN-13: 9781786462411

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS

Other recommended products

Related to this chapter

Python for Finance

Python is a free and powerful tool used for quantitative finance and is a popular choice amongst many financial analysts. This book will teach you the basics of quantitative finance, and how they can be implemented by making use of the various Python libraries and modules.This book introduces you to the basic concepts and operations related to Python and teaches you how to work with the various Python libraries like NumPy, Scipy, Matplotlib, and Pandas for quantitative analysis. You will work with time-series data, and implement concepts like stochastics for Monte-Carlo simulation, hedging, derivatives, portfolio optimization and more.This book is a hands-on guide with easy-to-follow examples to help you learn about option theory, quantitative finance, financial modeling, and time series using Python.

BookJun 2017586 pages

Practical Time Series Analysis

Practical Time Series Analysis will introduce you to the basic concepts of time series analysis and describe powerful yet simple techniques in Python which data scientists and data engineers would find useful in dealing with real life datasets in industrial settings. This book focuses on explaining important concepts and practical techniques to process, summarize and model time series data. Real life case studies with code snippets in Python are used to demonstrate the concepts and techniques.

BookSep 2017244 pages

Mastering Python for Finance

This book enables you to develop financial applications by harnessing Python’s strengths in data visualization, interactive analytics, and scientific computing. You will be using popular libraries such as TensorFlow, Keras, scikit-learn, and so on to extend the functionalities of your financial applications by using smart machine learning techniques.

BookApr 2019426 pages

Hands-On Python for Finance

With this book, you will learn and implement various Quantitative Finance concepts using popular Python libraries like Numpy, pandas, Keras and more. We provide techniques to apply statistical methods used for data preprocessing and predict some of the best real-world case scenarios like stock prediction, sales prediction and many examples as such.

BookMar 2019378 pages

Learn Algorithmic Trading

This book will provide knowledge and hands-on practical experience required to build a good understanding of how modern electronic trading markets and market participants operate. You will learn how to design, build and operate all the components required to build a practical and profitable algorithmic trading business using Python.

BookNov 2019394 pages

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

BookJan 2020432 pages

Hands-On Time Series Analysis with R

This book introduces you to time series analysis and forecasting with R; this is one of the key fields in statistical programming and includes techniques for analyzing data to extract meaningful insights. You will explore methods, such as prediction with time series analysis, and identify the relationship between each data point in the series.

BookMay 2019448 pages

Hands-On Data Science with Anaconda

Hands-On Data Science with Anaconda gets you started with Anaconda and demonstrates how you can use it to perform data science operations in the real world. You will learn different ways to retrieve data from various sources and different visualization tools packages available in Python, R, and Julia.

BookMay 2018364 pages

SAS for Finance

SAS is the ground-breaking tool for advanced, predictive, and statistical analytics. Right from refining your data using power of SAS analytics, you will be able to exploit the capabilities of high-powered package to create accurate financial models. You can easily assess the pros and cons of models to suit unique business needs.

BookMay 2018306 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Hands-On Financial Trading with Python

This book focuses on key Python analytics and algorithmic trading libraries used for backtesting. With the help of practical examples, you will learn the principle aspects of trading strategy development. The 14 profitable strategies included in the book will also help you build intuitions that will enable you to create your own strategy.

BookApr 2021360 pages

Machine Learning for Algorithmic Trading

This thoroughly revised and expanded second edition demonstrates on over 800 pages how machine learning can add value to algorithmic trading in a practical yet comprehensive way. It has four parts that cover how to work with a diverse set of market, fundamental, and alternative data sources, design ML solutions for real-world trading challenges, and manage the strategy development process from idea to backtesting and evaluation.

BookJul 2020822 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

	`StockYPrice`	`StockX1Price`	`StockX2Price`	`StockX3Price`	`StockX4Price`
1	80.13	72.86	93.1	63.7	83.1
2	79.57	72.88	90.2	63.5	82
3	79.93	71.72	99	64...