Packt+ | Advance your knowledge in tech

You're reading from Learning Quantitative Finance with R

Product typeBook

Published inMar 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781786462411

Edition1st Edition

Languages

Concepts

Financial Technology

Authors (2):

Dr. Param Jeet

PRASHANT VATS

View More author details

Chapter 6. Trading Using Machine Learning

In the capital market, machine learning-based algorithmic trading is quite popular these days and many companies are putting a lot of effort into machine learning-based algorithms which are either proprietary or for clients. Machine learning algorithms are programmed in such a way that they learn continuously and change their behavior automatically. This helps to identify new patterns when they emerge in the market. Sometimes patterns in the capital market are so complex they cannot be captured by humans. Even if humans somehow managed to find one pattern, humans do not have the tendency to find it efficiently. Complexity in patterns forces people to look for alternative mechanisms which identify such complex patterns accurately and efficiently.

In the previous chapter, you got the feel of momentum, pairs-trading-based algorithmic trading, and portfolio construction. In this chapter, I will explain step by step a few supervised and unsupervised...

Logistic regression neural network

Market direction is very important for investors or traders. Predicting market direction is quite a challenging task as market data involves lots of noise. The market moves either upward or downward and the nature of market movement is binary. A logistic regression model help us to fit a model using binary behavior and forecast market direction. Logistic regression is one of the probabilistic models which assigns probability to each event. I am assuming you are well versed with extracting data from Yahoo as you have studied this in previous chapters. Here again, I am going to use the quantmod package. The next three commands are used for loading the package into the workspace, importing data into R from the yahoo repository and extracting only the closing price from the data:

>library("quantmod")
>getSymbols("^DJI",src="yahoo")
>dji<- DJI[,"DJI.Close"]

The input data to the logistic regression is constructed using different indicators, such...

Neural network

In the previous section, I implemented a model using two classes. In reality, it might be possible that traders do not want to enter trade when the market is range-bound. That is to say, we have to add one more class, Nowhere, to the existing two classes. Now we have three classes: Up, Down, and Nowhere. I will be using an artificial neural network to predict Up, Down, or Nowhere direction. Traders buy (sell) when they anticipate a bullish (bearish) trend in some time and no investment when the market is moving Nowhere. An artificial neural network with feedforward backpropagation will be implemented in this section. A neural network requires input and output data to the neural network. Closing prices and indicators derived from closing prices are input layer nodes and three classes (Up, Down, and Nowhere) are output layer nodes. However, there is no limit on the number of nodes in the input layer. I will use a dataset consisting of prices and indicators used in the logistic...

Deep neural network

Deep neural networks are under the broad category of deep learning. In contrast to neural networks, deep neural networks contain multiple hidden layers. The number of hidden layers can vary from problem to problem and needs to be optimized. R has many packages, such as darch, deepnet, deeplearning, and h20, which can create deep networks. However, I will use the deepnet package in particular and apply a deep neural network on DJI data. The package deepnet can be installed and loaded to the workspace using the following commands:

>install.packages('deepnet') 
>library(deepnet)

I will use set.seed() to generate uniform output and dbn.dnn.train() is used for training deep neural networks. The parameter hidden is used for the number of hidden layers and the number of neurons in each layer.

In the below example, I have used a three hidden layer structure and 3, 4, and 6 neurons in the first, second, and third hidden layers respectively. class.ind() is again used to convert...

K means algorithm

The K means algorithm is an unsupervised machine learning algorithm. Unsupervised learning is another way of classifying the data as it does not require labeling of the data. In reality, there are many instances where labeling of the data is not possible, so we require them to classify data based on unsupervised learning. Unsupervised learning uses the similarity between data elements and assigns each data point to its relevant cluster. Each cluster has a set of data points which are similar in nature. The K means algorithm is the most basic unsupervised learning algorithm and it just requires data to plug into the algorithm along with the number of clusters we would like it to cluster returning the vector of cluster labeling for each data point. I used normalized data along with the number of clusters. I used the in-sample data which was used during logistic regression, to be divided into three clusters.

set.seed() is used to have the same output in every iteration; without...

K nearest neighborhood

K nearest neighborhood is another supervised learning algorithm which helps us to figure out the class of the out-sample data among k classes. K has to be chosen appropriately, otherwise it might increase variance or bias, which reduces the generalization capacity of the algorithm. I am considering Up, Down, and Nowhere as three classes which have to be recognized on the out-sample data. This is based on Euclidian distance. For each data point in the out-sample data, we calculate its distance from all data points in the in-sample data. Each data point has a vector of distances and the K distance which is close enough will be selected and the final decision about the class of the data point is based on a weighted combination of all k neighborhoods:

>library(class)

The K nearest neighborhood function in R does not need labeled values in the training data. So I am going to use the normalized in-sample and normalized out-sample data created in the Logistic regression...

Support vector machine

Support vector machine is another supervised learning algorithm that can be used for classification and regression. It is able to classify data linearly and nonlinearly using kernel methods. Each data point in the training dataset is labeled, as it is supervised learning, and mapped to the input feature space, and the aim is to classify every point of new data to one of the classes. A data point is an N dimension number, as N is the number of features, and the problem is to separate this data using N-1 dimensional hyperplane and this is considered to be a linear classifier. There might be many classifiers which segregate the data; however, the optimal classifier is one which has the maximum margin between classes. The maximum margin hyperplane is one which has the maximum distance from the closest point in each size and the corresponding classifier is called the maximum margin classifier. Package e1071 has all functionalities related to the support vector machine so...

Decision tree

Tree-based learning algorithms are one of the best supervised learning methods. They generally have stability over results, and great accuracy and generalization capacity to the out-sample dataset. They can map linear and nonlinear relationships quite well. It is generally represented in the form of a tree of variables and its results. The nodes in a tree are variables and end values are decision rules. I am going to use the package party to implement a decision tree. This package first need to be installed and loaded into the workspace using the following commands:

>install.packages("party")
>library(party)

The ctree() function is the function to fit the decision tree and it requires a formula and data as mandatory parameters and it has a few more optional variables. The normalized in-sample and normalized out-sample data does not have labels in the data so we have to merge labels in the data.

The following commands bind labels into the normalized in-sample and normalized...

Random forest

Random forest is one of the best tree-based methods. Random forest is an ensemble of decision trees and each decision tree has certain weights associated with it. A decision of the random forest is decided like voting, as the majority of decision tree outcomes decide the outcome of the random forest. So we start using the randomForest package and this can be installed and loaded using the following commands:

>install.packages("randomForest")
>library(randomForest)

We can also use the following command to know more about this randomForest package, including version, date of release, URL, set of functions implemented in this package, and much more:

>library(help=randomForest)

Random forest works best for any type of problem and handles classification, regression, and unsupervised problems quite well. Depending upon the type of labeled variable, it will implement relevant decision trees; for example, it uses classification for factor target variables, regression for numeric...

Questions

What is machine learning and how it being used in the capital market? Explain in brief.
What is logistic regression and in which form does it generate its output?
Write a small piece of code to use a neural network for any stock time series.
How does a confusion matrix explain the accuracy of a model?
How do you standardize data and why is it important in the model building process?
How is support vector machine different from logistic regression?
Explain supervised and unsupervised learning and how to use these techniques in algorithmic trading.
Write a small piece of code for the k means algorithm using any one stock closing price.
Apart from confusionMatrix(), what is the other function to calculate classification and misclassification matrices?
What is the difference between decision tree and random forest and how are features selected from random forest?

Summary

This chapter presents advanced techniques which are implemented for capital markets. I have presented various supervised and unsupervised learning in detail along with examples. This chapter particularly used Dow Jones Index closing price as dataset, which was divided into in-sample and out-sample data. The in-sample data was used for model building and the out-sample data for validation of the model. Overfitting and underfitting generally questions the generalization capacity of the model which can be understand using confusion matrix. The accuracy of the model was defined using confusionMatrix() or table().

There are various types of risks that exists in the market and in the next chapter, I will explain how to calculate risk associated with various investments, in particular market risk, portfolio risk, and so on. I will also explain Monte Carlo simulation for risk, hedging techniques, and credit risk, along with Basel regulations.

The rest of the chapter is locked

You have been reading a chapter from

Learning Quantitative Finance with R

Published in: Mar 2017Publisher: PacktISBN-13: 9781786462411

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Dr. Param Jeet

Dr. Param Jeet is a Ph.D. in mathematics from one of India's leading technological institute in Madras (IITM), India. Dr. Param Jeet has a couple of mathematical research papers published in various international journals. Dr. Param Jeet has been into the analytics industry for the last few years and has worked with various leading multinational companies as well as consulted few of companies as a data scientist.
Read more about Dr. Param Jeet

PRASHANT VATS

Prashant Vats is a masters in mathematics from one of India's leading technological institute, IIT Mumbai. Prashant has been into analytics industry for more than 10 years and has worked with various leading multinational companies as well as consulted few of companies as data scientist across several domain.
Read more about PRASHANT VATS

Other recommended products

Related to this chapter

Python for Finance

Python is a free and powerful tool used for quantitative finance and is a popular choice amongst many financial analysts. This book will teach you the basics of quantitative finance, and how they can be implemented by making use of the various Python libraries and modules.This book introduces you to the basic concepts and operations related to Python and teaches you how to work with the various Python libraries like NumPy, Scipy, Matplotlib, and Pandas for quantitative analysis. You will work with time-series data, and implement concepts like stochastics for Monte-Carlo simulation, hedging, derivatives, portfolio optimization and more.This book is a hands-on guide with easy-to-follow examples to help you learn about option theory, quantitative finance, financial modeling, and time series using Python.

BookJun 2017586 pages

Practical Time Series Analysis

Practical Time Series Analysis will introduce you to the basic concepts of time series analysis and describe powerful yet simple techniques in Python which data scientists and data engineers would find useful in dealing with real life datasets in industrial settings. This book focuses on explaining important concepts and practical techniques to process, summarize and model time series data. Real life case studies with code snippets in Python are used to demonstrate the concepts and techniques.

BookSep 2017244 pages

Mastering Python for Finance

This book enables you to develop financial applications by harnessing Python’s strengths in data visualization, interactive analytics, and scientific computing. You will be using popular libraries such as TensorFlow, Keras, scikit-learn, and so on to extend the functionalities of your financial applications by using smart machine learning techniques.

BookApr 2019426 pages

Hands-On Python for Finance

With this book, you will learn and implement various Quantitative Finance concepts using popular Python libraries like Numpy, pandas, Keras and more. We provide techniques to apply statistical methods used for data preprocessing and predict some of the best real-world case scenarios like stock prediction, sales prediction and many examples as such.

BookMar 2019378 pages

Learn Algorithmic Trading

This book will provide knowledge and hands-on practical experience required to build a good understanding of how modern electronic trading markets and market participants operate. You will learn how to design, build and operate all the components required to build a practical and profitable algorithmic trading business using Python.

BookNov 2019394 pages

Python for Finance Cookbook

Python is becoming the number one language for data science and also quantitative finance. This book provides you with solutions to common tasks from the intersection of quantitative finance and data science, using modern Python libraries.

BookJan 2020432 pages

Hands-On Time Series Analysis with R

This book introduces you to time series analysis and forecasting with R; this is one of the key fields in statistical programming and includes techniques for analyzing data to extract meaningful insights. You will explore methods, such as prediction with time series analysis, and identify the relationship between each data point in the series.

BookMay 2019448 pages

Hands-On Data Science with Anaconda

Hands-On Data Science with Anaconda gets you started with Anaconda and demonstrates how you can use it to perform data science operations in the real world. You will learn different ways to retrieve data from various sources and different visualization tools packages available in Python, R, and Julia.

BookMay 2018364 pages

SAS for Finance

SAS is the ground-breaking tool for advanced, predictive, and statistical analytics. Right from refining your data using power of SAS analytics, you will be able to exploit the capabilities of high-powered package to create accurate financial models. You can easily assess the pros and cons of models to suit unique business needs.

BookMay 2018306 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

Hands-On Financial Trading with Python

This book focuses on key Python analytics and algorithmic trading libraries used for backtesting. With the help of practical examples, you will learn the principle aspects of trading strategy development. The 14 profitable strategies included in the book will also help you build intuitions that will enable you to create your own strategy.

BookApr 2021360 pages

Machine Learning for Algorithmic Trading

This thoroughly revised and expanded second edition demonstrates on over 800 pages how machine learning can add value to algorithmic trading in a practical yet comprehensive way. It has four parts that cover how to work with a diverse set of market, fundamental, and alternative data sources, design ML solutions for real-world trading challenges, and manage the strategy development process from idea to backtesting and evaluation.

BookJul 2020822 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages