Reader small image

You're reading from  Learning Quantitative Finance with R

Product typeBook
Published inMar 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781786462411
Edition1st Edition
Languages
Right arrow
Authors (2):
Dr. Param Jeet
Dr. Param Jeet
author image
Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

PRASHANT VATS
PRASHANT VATS
author image
PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS

View More author details
Right arrow

Chapter 6.  Trading Using Machine Learning

In the capital market, machine learning-based algorithmic trading is quite popular these days and many companies are putting a lot of effort into machine learning-based algorithms which are either proprietary or for clients. Machine learning algorithms are programmed in such a way that they learn continuously and change their behavior automatically. This helps to identify new patterns when they emerge in the market. Sometimes patterns in the capital market are so complex they cannot be captured by humans. Even if humans somehow managed to find one pattern, humans do not have the tendency to find it efficiently. Complexity in patterns forces people to look for alternative mechanisms which identify such complex patterns accurately and efficiently.

In the previous chapter, you got the feel of momentum, pairs-trading-based algorithmic trading, and portfolio construction. In this chapter, I will explain step by step a few supervised and unsupervised...

Logistic regression neural network


Market direction is very important for investors or traders. Predicting market direction is quite a challenging task as market data involves lots of noise. The market moves either upward or downward and the nature of market movement is binary. A logistic regression model help us to fit a model using binary behavior and forecast market direction. Logistic regression is one of the probabilistic models which assigns probability to each event. I am assuming you are well versed with extracting data from Yahoo as you have studied this in previous chapters. Here again, I am going to use the quantmod package. The next three commands are used for loading the package into the workspace, importing data into R from the yahoo repository and extracting only the closing price from the data:

>library("quantmod")
>getSymbols("^DJI",src="yahoo")
>dji<- DJI[,"DJI.Close"]

The input data to the logistic regression is constructed using different indicators, such...

Neural network


In the previous section, I implemented a model using two classes. In reality, it might be possible that traders do not want to enter trade when the market is range-bound. That is to say, we have to add one more class, Nowhere, to the existing two classes. Now we have three classes: Up, Down, and Nowhere. I will be using an artificial neural network to predict Up, Down, or Nowhere direction. Traders buy (sell) when they anticipate a bullish (bearish) trend in some time and no investment when the market is moving Nowhere. An artificial neural network with feedforward backpropagation will be implemented in this section. A neural network requires input and output data to the neural network. Closing prices and indicators derived from closing prices are input layer nodes and three classes (Up, Down, and Nowhere) are output layer nodes. However, there is no limit on the number of nodes in the input layer. I will use a dataset consisting of prices and indicators used in the logistic...

Deep neural network


Deep neural networks are under the broad category of deep learning. In contrast to neural networks, deep neural networks contain multiple hidden layers. The number of hidden layers can vary from problem to problem and needs to be optimized. R has many packages, such as darch, deepnet, deeplearning, and h20, which can create deep networks. However, I will use the deepnet package in particular and apply a deep neural network on DJI data. The package deepnet can be installed and loaded to the workspace using the following commands:

>install.packages('deepnet') 
>library(deepnet)

I will use set.seed() to generate uniform output and dbn.dnn.train() is used for training deep neural networks. The parameter hidden is used for the number of hidden layers and the number of neurons in each layer.

In the below example, I have used a three hidden layer structure and 3, 4, and 6 neurons in the first, second, and third hidden layers respectively. class.ind() is again used to convert...

K means algorithm


The K means algorithm is an unsupervised machine learning algorithm. Unsupervised learning is another way of classifying the data as it does not require labeling of the data. In reality, there are many instances where labeling of the data is not possible, so we require them to classify data based on unsupervised learning. Unsupervised learning uses the similarity between data elements and assigns each data point to its relevant cluster. Each cluster has a set of data points which are similar in nature. The K means algorithm is the most basic unsupervised learning algorithm and it just requires data to plug into the algorithm along with the number of clusters we would like it to cluster returning the vector of cluster labeling for each data point. I used normalized data along with the number of clusters. I used the in-sample data which was used during logistic regression, to be divided into three clusters.

set.seed() is used to have the same output in every iteration; without...

K nearest neighborhood


K nearest neighborhood is another supervised learning algorithm which helps us to figure out the class of the out-sample data among k classes. K has to be chosen appropriately, otherwise it might increase variance or bias, which reduces the generalization capacity of the algorithm. I am considering Up, Down, and Nowhere as three classes which have to be recognized on the out-sample data. This is based on Euclidian distance. For each data point in the out-sample data, we calculate its distance from all data points in the in-sample data. Each data point has a vector of distances and the K distance which is close enough will be selected and the final decision about the class of the data point is based on a weighted combination of all k neighborhoods:

>library(class)

The K nearest neighborhood function in R does not need labeled values in the training data. So I am going to use the normalized in-sample and normalized out-sample data created in the Logistic regression...

Support vector machine


Support vector machine is another supervised learning algorithm that can be used for classification and regression. It is able to classify data linearly and nonlinearly using kernel methods. Each data point in the training dataset is labeled, as it is supervised learning, and mapped to the input feature space, and the aim is to classify every point of new data to one of the classes. A data point is an N dimension number, as N is the number of features, and the problem is to separate this data using N-1 dimensional hyperplane and this is considered to be a linear classifier. There might be many classifiers which segregate the data; however, the optimal classifier is one which has the maximum margin between classes. The maximum margin hyperplane is one which has the maximum distance from the closest point in each size and the corresponding classifier is called the maximum margin classifier. Package e1071 has all functionalities related to the support vector machine so...

Decision tree


Tree-based learning algorithms are one of the best supervised learning methods. They generally have stability over results, and great accuracy and generalization capacity to the out-sample dataset. They can map linear and nonlinear relationships quite well. It is generally represented in the form of a tree of variables and its results. The nodes in a tree are variables and end values are decision rules. I am going to use the package party to implement a decision tree. This package first need to be installed and loaded into the workspace using the following commands:

>install.packages("party")
>library(party)

The ctree() function is the function to fit the decision tree and it requires a formula and data as mandatory parameters and it has a few more optional variables. The normalized in-sample and normalized out-sample data does not have labels in the data so we have to merge labels in the data.

The following commands bind labels into the normalized in-sample and normalized...

Random forest


Random forest is one of the best tree-based methods. Random forest is an ensemble of decision trees and each decision tree has certain weights associated with it. A decision of the random forest is decided like voting, as the majority of decision tree outcomes decide the outcome of the random forest. So we start using the randomForest package and this can be installed and loaded using the following commands:

>install.packages("randomForest")
>library(randomForest)

We can also use the following command to know more about this randomForest package, including version, date of release, URL, set of functions implemented in this package, and much more:

>library(help=randomForest)

Random forest works best for any type of problem and handles classification, regression, and unsupervised problems quite well. Depending upon the type of labeled variable, it will implement relevant decision trees; for example, it uses classification for factor target variables, regression for numeric...

Questions


  1. What is machine learning and how it being used in the capital market? Explain in brief.

  2. What is logistic regression and in which form does it generate its output?

  3. Write a small piece of code to use a neural network for any stock time series.

  4. How does a confusion matrix explain the accuracy of a model?

  5. How do you standardize data and why is it important in the model building process?

  6. How is support vector machine different from logistic regression?

  7. Explain supervised and unsupervised learning and how to use these techniques in algorithmic trading.

  8. Write a small piece of code for the k means algorithm using any one stock closing price.

  9. Apart from confusionMatrix(), what is the other function to calculate classification and misclassification matrices?

  10. What is the difference between decision tree and random forest and how are features selected from random forest?

Summary


This chapter presents advanced techniques which are implemented for capital markets. I have presented various supervised and unsupervised learning in detail along with examples. This chapter particularly used Dow Jones Index closing price as dataset, which was divided into in-sample and out-sample data. The in-sample data was used for model building and the out-sample data for validation of the model. Overfitting and underfitting generally questions the generalization capacity of the model which can be understand using confusion matrix. The accuracy of the model was defined using confusionMatrix() or table().

There are various types of risks that exists in the market and in the next chapter, I will explain how to calculate risk associated with various investments, in particular market risk, portfolio risk, and so on. I will also explain Monte Carlo simulation for risk, hedging techniques, and credit risk, along with Basel regulations.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Quantitative Finance with R
Published in: Mar 2017Publisher: PacktISBN-13: 9781786462411
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

author image
PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS