Packt+ | Advance your knowledge in tech

You're reading from Mastering Clojure Data Analysis

Product typeBook

Published inMay 2014

Reading LevelBeginner

Publisher

ISBN-139781783284139

Edition1st Edition

Languages

Clojure

Concepts

Data Analysis

Author (1)

Eric Richard Rochester

Chapter 10. Modeling Stock Data

Automated stock analysis has gotten a lot of press recently. High-frequency trading firms are a flashpoint. People either believe that they're great for the markets and increasing liquidity, or that they're precursors to the apocalypse. Smaller traders have also gotten into the mix in a slower fashion. Some sites, such as Quantopian (https://www.quantopian.com/) and AlgoTrader (http://www.algotrader.ch/) provide services that allow you to create models for automated trading. Many others allow you to use automated analysis to inform your trading decisions.

Whatever your view of this phenomena, it's an area with a lot of data begging to be analyzed. It's also a nice domain in which to experiment with some analysis and machine learning techniques.

For this chapter, we're going to look for relationships between news articles and stock prices in the future.

In the course of this chapter, we will cover the following topics:

Learn about financial data analysis
Set up...

Learning about financial data analysis

Finance has always relied heavily on data. Earnings statements, forecasting, and portfolio management are just some of the areas that make use of data to quantify their decisions. Because of this, financial data analysis and its related field, financial engineering, are extremely broad fields that are difficult to summarize in a short amount of space.

However, lately, quantitative finance, high-frequency trading, and similar fields have gotten a lot of press and really come into their own. As I mentioned, some people hate them and the added volatility that the markets seem to have. Others maintain that they bring the necessary liquidity that helps the market function better.

All of these fields apply statistical or machine learning methods to financial data. Some of these techniques can be quite simple. Others are more sophisticated. Some of these analyses are used to inform a human analyst or manager to make better financial decisions. Others are used...

Setting up the basics

Before we really dig into the project and the data, we need to prepare. We'll set up the code and the library, and then we'll download the data.

Setting up the library

First, we'll need to initialize the library. We can do this using Leiningen 2 (http://leiningen.org/) and Stuart Sierra's reloaded plugin for it (https://github.com/stuartsierra/reloaded). This will initialize the development environment and project.

To do this, just execute the following command at the prompt (I've named the project financial in this case):

lein new reloaded financial

Now, we can specify the libraries that we'll need to use. We can do this in the project.clj file. Open it and replace its current contents with the following lines:

(defproject financial "0.1.0-SNAPSHOT":dependencies [[org.clojure/clojure "1.5.1"][org.clojure/data.xml "0.0.7"][org.clojure/data.csv "0.1.2"][clj-time "0.6.0"][me.raynes/fs "1.4.4"][org.encog/encog-core "3.1.0"][enclog "0.6.3"]]:profiles
  {:dev {:dependencies...

Getting prepared with data

As usual, now we need to clean up the data and put it into a shape that we can work with. The news article dataset particularly will require some attention, so let's turn our attention to it first.

Working with news articles

The OANC is published in an XML format that includes a lot of information and annotations about the data. Specifically, this marks off:

Sections and chapters
Sentences
Words with part-of-speech lemma
Noun chunks
Verb chunks
Named entities

However, we want the option to use raw text later when the system is actually being used. Because of that, we will ignore the annotations and just extract the raw tokens. In fact, all we're really interested in is each document's text—either as a raw string or a feature vector—and the date it was published. Let's create a record type for this.

We'll put this into the types.clj file in src/financial/. Put this simple namespace header into the file:

(ns financial.types)

This data record will be similarly simple. It can...

Analyzing the text

Our goal for analyzing the news articles is to generate a vector space model of the collection of documents. This attempts to pull the salient features for the documents into a vector of floating-point numbers. Features can be words or information from the documents' metadata encoded for the vector. The feature values can be 0 or 1 for presence, an integer for raw frequency, or the frequency scaled in some form.

In our case, we'll use the feature vector to represent a selection of the tokens in a document. Often, we can use all the tokens, or all the tokens that occur more than once or twice. However, in this case, we don't have a lot of data, so we'll need to be more selective in the features that we include. We'll consider how we select these in a few sections.

For the feature values, we'll use a scaled version of the token frequency called term frequency-inverse document frequency (tf-idf). There are good libraries for this, but this is a basic metric in working with...

Inspecting the stock prices

Now that we have some hold on the textual data, let's turn our attention to the stock prices. Previously, we loaded it from the CSV file using the financial.csv-data/read-stock-prices function. Let's reload that data with the following commands:

user=> (def stock (csvd/read-stock-prices "d/d-1995-2001.csv"))
user=> (count stock)
1263

Let's start with a graph that shows how the closing price has changed over the years:

So the price started in the low 30s, fluctuated a bit, and finished in the low 20s. During that time, there were some periods where it climbed rapidly. Hopefully, we'll be able to capture and predict those changes.

Merging text and stock features

Before we can start to train the neural network, however, we'll need to figure out how we need to represent the data and what information the neural network needs to have.

The code for this section will be present in the src/financial/nn.clj file. Open it up and add the following namespace header:

(ns financial.nn
  (:require [clj-time.core :as time]
            [clj-time.coerce :as time-coerce]
            [clojure.java.io :as io]
            [enclog.nnets :as nnets]
            [enclog.training :as training]
            [financial.utils :as u]
            [financial.validate :as v])
  (:import [org.encog.neural.networks PersistBasicNetwork]))

However, we first need to be clear about what we're trying to do. That will allow us to properly format and present the data.

Let's break it down like this: for each document, based on the previous stock prices and the tokens in a document, can we predict the direction of future stock prices.

So one set of features will...

Analyzing both text and stock features together with neural nets

We now have everything ready to perform the analysis, except for the engine that will actually attempt to learn the training data.

In this instance, we're going to try to train an artificial neural network to learn the direction of change of the future prices of the input data. In other words, we'll try to train it to tell whether the price will go up or down in the near future. We want to create a simple binary classifier from the past price changes and the text of an article.

Understanding neural nets

As the name implies, artificial neural networks are machine learning structures modeled on the architecture and behavior of neurons, such as the ones found in the human brain. Artificial neural networks come in many forms, but today we're going to use one of the oldest and most common forms: the three-layer feed-forward network.

We can see the structure of a unit outlined in the following figure:

Each unit is able to realize linearly...

Predicting the future

Now is the time to bring together everything that we've assembled over the course of this chapter, so it seems appropriate to start over from scratch, just using the Clojure source code that we've written over the course of the chapter.

We'll take this one block at a time, loading and processing the data, creating training and test sets, training and validating the neural network, and finally viewing and analyzing its results.

Before we do any of this, we'll need to load the proper namespaces into the REPL. We can do that with the following require statement:

user=> (require
         [me.raynes.fs :as fs]
         [financial]
         [financial.types :as t]
         [financial.nlp :as nlp]
         [financial.nn :as nn]
         [financial.oanc :as oanc]
         [financial.csv-data :as csvd]
         [financial.utils :as u])

This will give us access to everything that we've implemented so far.

Loading stock prices

First, we'll load the stock prices with the following...

Taking it with a grain of salt

Any analysis like the one presented in this chapter has a number of things that we need to question. This chapter is no exception.

Related to this project

The main weakness of this project was that it was carried out on far too little data. This cuts in several ways:

We need articles from a number of data sources
We need articles from a wider range of time
We need more density of articles in the time period

For all of these, there are reasons we didn't address the issues in this chapter. However, if you plan to take this further, you'd need to figure out some way around these.

There are several ways to look at the results too. The day we looked at, the results all clustered close to zero. In fact, this stock if relatively stable, so if it always indicated little change, then it would always have a fairly low SSE. Large changes seem to happen occasionally, and the error from not predicting them has a low impact on the SSE.

Related to machine learning and market modeling...

Summary

Over the course of this chapter, we've gotten a hold of some news articles and some stock prices, and we've managed to train a neural network that projects just a little into the future. This is a risky thing to put into production, but we've also outlined what we'd need to learn to do this correctly.

And this is also the end of this book. Thank you for staying with me this far. You've been a great reader. I hope that you've learned something as we've looked at the 10 data analysis projects that we've covered. If programming and data are both eating this world, hopefully you've seen how to have fun with both.

The rest of the chapter is locked

You have been reading a chapter from

Mastering Clojure Data Analysis

Published in: May 2014Publisher: ISBN-13: 9781783284139

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Eric Richard Rochester

Eric Richard Rochester Studied medieval English literature and linguistics at UGA. Dissertated on lexicography. Now he programs in Haskell and writes. He's also a husband and parent.
Read more about Eric Richard Rochester

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages